Comprehensive Analysis of a Multidimensional Liquid Chromatography Mass Spectrometry Dataset Acquired on a Quadrupole Selecting, Quadrupole Collision Cell, Time-of-flight Mass Spectrometer

Robert J. Chalkley(University of California, San Francisco), Peter R. Baker(University of California, San Francisco), Lan Huang(University of California, San Francisco), Kirk C. Hansen(University of California, San Francisco), Nadia P.C. Allen(Stanford University), Michael Rexach(Stanford University), Alma L. Burlingame(University of California, San Francisco)
Molecular & Cellular Proteomics
June 3, 2005
Cited by 185Open Access
Full Text

Abstract

A thorough analysis of the protein interaction partners of the yeast GTPase Gsp1p was carried out by a multidimensional chromatography strategy of strong cation exchange fractionation of peptides followed by reverse phase LC-ESI-MSMS using a QSTAR instrument. This dataset was then analyzed using the latest developmental version of Protein Prospector. The Prospector search results were also compared with results from the search engine “Mascot” using a new results comparison program within Prospector named “SearchCompare.” The results from this study demonstrate that the high quality data produced on a quadrupole selecting, quadrupole collision cell, time-of-flight (QqTOF) geometry instrument allows for confident assignment of the vast majority of interpretable spectra by current search engines. A thorough analysis of the protein interaction partners of the yeast GTPase Gsp1p was carried out by a multidimensional chromatography strategy of strong cation exchange fractionation of peptides followed by reverse phase LC-ESI-MSMS using a QSTAR instrument. This dataset was then analyzed using the latest developmental version of Protein Prospector. The Prospector search results were also compared with results from the search engine “Mascot” using a new results comparison program within Prospector named “SearchCompare.” The results from this study demonstrate that the high quality data produced on a quadrupole selecting, quadrupole collision cell, time-of-flight (QqTOF) geometry instrument allows for confident assignment of the vast majority of interpretable spectra by current search engines. Modern mass spectrometers are able to produce large amounts of information-rich data in relatively short periods of time. The bottleneck in mass spectrometry-based peptide and protein identification is now at the stage of data analysis and verification of results. There are several search engines available that can analyze large datasets in a batch fashion, most notably Mascot (www.matrixscience.com) and Sequest (1Eng J.K. McCormack A.L. Yates J.R. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database.J. Am. Soc. Mass Spectrom. 1994; 5: 976-989Google Scholar). Although it would be desirable to be able to quote results from such searches without a need to look at and evaluate the raw data, this is not without risk at the moment as although both use probability-based scoring systems, the reliability of results from Sequest are known to be problematic (2Peng J. Elias J.E. Thoreen C.C. Licklider L.J. Gygi S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.J. Proteome. Res. 2003; 2: 43-50Google Scholar, 3Von Haller P.D. Yi E. Donohoe S. Vaughn K. Keller A. Nesvizhskii A.I. Eng J. Li X.J. Goodlett D.R. Aebersold R. Watts J.D. The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation.Mol. Cell. Proteomics. 2003; 2: 428-442Google Scholar), and no extensive study of the performance of Mascot on large datasets has been published. Hence a number of groups have developed statistical analysis programs for evaluating these search results to be able to better define the reliability of the reported matches (4Anderson D.C. Li W. Payan D.G. Noble W.S. A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: support vector machine classification of peptide MS/MS spectra and SEQUEST scores.J. Proteome Res. 2003; 2: 137-146Google Scholar, 5Keller A. Nesvizhskii A.I. Kolker E. Aebersold R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search.Anal. Chem. 2002; 74: 5383-5392Google Scholar, 6MacCoss M.J. Wu C.C. Yates III, J.R. Probability-based validation of protein identifications using a modified SEQUEST algorithm.Anal. Chem. 2002; 74: 5593-5599Google Scholar, 7Moore R.E. Young M.K. Lee T.D. Qscore: an algorithm for evaluating SEQUEST database search results.J. Am. Soc. Mass Spectrom. 2002; 13: 378-386Google Scholar). In addition, the data analyzed by these search engines are not the raw data but rather peak centroided mass lists extracted from the raw data that do not always fully represent the information content in the raw data. A summary of the complications arising from automated peptide and protein identification has recently been published (8Baldwin M.A. Protein identification by mass spectrometry: issues to be considered.Mol. Cell. Proteomics. 2004; 3: 1-9Google Scholar). Protein Prospector contains a suite of programs developed at University of California, San Francisco that is used for analysis of proteomic data (www.prospector.ucsf.edu). Historically it has been one of the major programs in proteomic analysis; however, the current web version (version 4.0.5) does not have the ability to analyze multiple MSMS spectra simultaneously in a batch fashion. Thus, its current use in analyzing large datasets is limited. Hence we have developed new programs within the Prospector framework specifically designed for large dataset analysis and comparison. The first of these is “Batch Tag,” which is based on the well established MS-Tag program but is able to analyze files containing large numbers of spectra from one or multiple sample fractions. A new program within Protein Prospector called “SearchCompare” has been developed that is able to summarize and filter large dataset results. It also converts the peptide scores from Batch Tag into a new discriminant score. The scoring system used by Batch Tag simply gives a certain score for with the of the scoring based on the of a for for are for instrument then multiple the Batch Tag results to produce a new discriminant probability-based scoring can also and multiple search results and is able to analysis of It can produce of by search by search or in a search An of this program is its ability to results from both Prospector and Mascot a peptide is by both search engines is a of it a to the using be a matches that are by one search engine but not by the and can these which a of spectra by the The dataset analyzed and to evaluate these new Protein Prospector is of an study of protein into and out of the by analyzing to of the A. analysis of Chem. Scholar, A. of protein at the Cell. Proteomics. 2002; Scholar, M.A. J. A.L. The identification of of the of using high time-of-flight tandem mass Cell. Proteomics. 2002; Scholar). There are of that specifically into the and out of the The interaction of and with is by the GTPase Gsp1p as in In its it of in its it by of a the and Gsp1p is able to A.L. The as a for at the Scholar, the of of Chem. 2002; Scholar, S. of and in and out of the Scholar). In this we to the in protein at the as the yeast the by at and with using the J. M.A. A.L. Mass analysis of protein at using affinity tag and multidimensional Cell. Proteomics. 2003; 2: Scholar). data were the of for of using the J. M.A. A.L. Mass analysis of protein at using affinity tag and multidimensional Cell. Proteomics. 2003; 2: Scholar). In in to analyzing the peptides for quantitative we also analyzed the peptides to better the sample and peptide identifications to the one or that are to a protein from the a peptides were in the arising from in the This was to the of sample is with of with high peptide to peptides not from the a large of data was on the peptides which a of was It is the data from these peptides that are in this database which we use to the performance of the new Prospector analysis software and its performance to that of a available search Gsp1p was and from as published A. analysis of Chem. Scholar). were at the stage of the using for or at phase using for and then were as published A. of protein at the Cell. Proteomics. 2002; Scholar). from both were with the to a of of and this was analyzed as in published for of J. M.A. A.L. Mass analysis of protein at using affinity tag and multidimensional Cell. Proteomics. 2003; 2: Scholar). were in and with and then of were with phase were with peptides were by strong cation exchange using a system with an was using a A A was acid and was A containing were and of these was the affinity of was and then peptides were into one using were in was using to the and then analyzed by reverse phase phase chromatography was using an system and a was using a at a of A was was The was peptides the were on into an used quadrupole selecting, quadrupole collision cell, instrument and were analyzed using to MSMS a to be for MSMS spectral a was was used for the to its lists of MSMS spectra from the were using the Mascot within that the data by data in the spectra within of to and data within of in the MSMS spectra were to The peak lists from were with Protein Prospector or Mascot (version searches on Prospector the mass from the to the peak was into and the most in of the were used for Mascot the raw peak and of the peak in an fashion. were carried out for mass accuracy for the and mass accuracy for of protein and the amino acid is a were as from search were and these were then analyzed and compared using analysis of the cation exchange of the from yeast a of MSMS spectra were spectra were using a new program in Protein Prospector called Batch Tag the database yeast one or and Batch Tag is based on but it can as its files from in several the Mascot lists from spectra can large numbers of to a of Hence Prospector the MSMS peak lists to It the and a to and on the mass of Protein Prospector is also able to certain as from rather a peptide A of the in QSTAR spectra is in the mass of the to the majority of the at which are at are Hence Protein Prospector in and the most in of the to search with This scoring and protein it be to the into or numbers of to however, this was not in this Prospector a scoring system of for are based on the of and the of in peptides on QSTAR instrument and are in in Batch of or of or of or from of of or from of in a new the peptide identification results such that the scoring for was reported with a number of to the search such as the Batch Tag number of peptides to the and the in score this and the scoring assignment This search with peptides of relatively high from that were to be in the spectra were and extensive and then analyzed spectra and spectra that matches to the these spectra not been to This a of for of this analysis are in the study A.L. analysis of a multidimensional chromatography mass spectrometry dataset on a quadrupole selecting, quadrupole collision cell, time-of-flight mass II. in Protein Prospector for and analysis of large Cell. Proteomics. Scholar). of the dataset we were able to produce a of peptides to of the of to the Protein Prospector Batch Tag search results of yeast and known were made by Batch Tag one for it of the spectra we as In spectra such as on a instrument is no to and be by high or K. A comparison of mass spectra of peptides with a mass with from a tandem mass J. Mass Spectrom. Scholar). a peptide scoring system for data has to score peptides with the It is also not always to and or and as these have This dataset was then the database and also the for The search the search The numbers of matches in these searches the number of in the were yeast of a of protein in the database Thus, the that of the spectra in this dataset to peptides from that were not in the database but were in the are reported in the search to the of an of database the scores of The that a peptide is the scoring to a not that the is to be A peptide identification as of an analysis of a is not an and peptides have been from the protein this assignment is to be we to this as a in to use the score for a peptide from a protein as a in a scoring a a with a score of and this is the to this then would be used as the scoring to this in the dataset a peptide in the protein with a score of then would be used as a for a new score for the that analysis of the search results also that a high in score the and a was a for a the score spectra with large numbers of in spectra with most of these were to from the to use the to a to the score the and matches to a to the and as such not be It be that as we and peptides was were with the in matches were from a we to the peptide score for a protein with the score for the in to the to a new score that is and the Batch Tag score. the scoring matches for from a search of the database of into the statistical and which matches were to then the of the to the ability of the score to and the as of these for and peptide score This that the are of for and a of the the discriminant scoring system and the results at the is or This that using a of as the for the were that were as and results that were reported as The of the discriminant score on the matches to a is in which the results for the at in In this the peptide identification was the to the discriminant but the score was not better the matches by the score with the peptide score for the protein to the protein as the a score of the new discriminant score the as a confident the matches scores and discriminant scores for the matches to the MSMS at in the first cation exchange peptide in a new that the discriminant score and the Batch Tag peptide a of peptide scores a and the discriminant score one to results and one to results. the discriminant score to on the of the peptide score for a protein or the score that the discriminant it that of these is as as the of the which a scoring system from which a for and matches can be the of discriminant scoring peptides for was compared with peptide assignment were peptide matches of a compared with that were by peptide score Thus, the discriminant scoring of the spectra as The results of this search are in a of for the search of this dataset to a discriminant score of Protein Prospector matches to be of which were and were and of we this dataset spectra of the were compared with by peptide score spectra were reported as at score of with and a of peptide score discriminant score. This that is a peptide and discriminant score one would but the is as one peptide score. it that the of peptide and discriminant scores does not and at the peptide results with the discriminant scoring is a for for the at in cation exchange as the discriminant scoring with a discriminant score of which it with high is the as which is the scoring in Batch Tag to the discriminant scoring The peptide matches for this to the peptides are in which that the to the peptide is better it of the and the The the peptide is as is to the scoring peptide from a peptides are from that are in the The from protein has a scoring peptide of the discriminant scoring is to a peptide from which has the scoring peptide in the scoring Thus, is a in the to peptides from in the This is an that is also in Mascot but not be as in Prospector the are compared with in This does not at the protein that be is this new scoring better at analyzing of a as this is for analysis of data. the spectra were algorithm that and for spectra were of were of and were of The of results at that is not a the reliability of results for and and are not results for to statistical The reliability of results for is by the that Prospector is able to from the it can with and reliability of peptide at of in a new A approach to estimate the of a database search is to search the dataset a database (2Peng J. Elias J.E. Thoreen C.C. Licklider L.J. Gygi S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.J. Proteome. Res. 2003; 2: 43-50Google Scholar, A. S. A. A. W. the of with by time-of-flight mass spectrometry and Chem. Scholar). The is made that matches are by using the scoring used for results from the database a of the can be this dataset a version of the database matches the and the to and at the A the for of the the peptide was or to the The scoring the database was from the at in cation exchange which the The for this is The database and which are in the to the also and The scoring peptide of the in the database search was to a at in the cation exchange and the from the of the in the database is the peptide from the database and database and search this dataset then the number of peptides reported from the database is from that by the the and the This the of in the database on the and a database does not produce an estimate of the using the discriminant scoring of Protein Prospector. most the information is to the search engine at the protein rather at the peptide it is the protein results that are used for of the for analysis the reliability of the peptide identifications is Protein Prospector is not using a protein scoring the peptide discriminant score is using information better peptides to the protein these this score also well at the protein simply by protein matches to have a peptide a certain The of the is to on the the is to a identifications in for the reliability of the results the performance of the discriminant scoring at containing a peptide at and contains a comparison with the performance of Mascot on this dataset the The Mascot search was on an version of and were used that are not used on the web version of the a peptide score of was for a to be the was to this that that at one peptide that is and it is the to the and it is the first this has been reported in the search are The of a peptide score was a for results using of Mascot and Mascot version a new scoring system for large datasets that using this score in of results. in the of the for to a peptide is a filter for the reliability of results. Mascot a of for protein results. this dataset Mascot reported a score of as or extensive Prospector and Mascot of this in a new that of the reported by Mascot are This the performance of Mascot and that reported can be by the in a of the that we have as for Mascot are that are to that are in the sample of the peptide matches to this protein are but the peptide matches to this protein are Although these matches are not do to these matches to the Mascot scoring as its scoring model Protein Prospector the database and using a peptide for protein matches a Protein Prospector to out of the protein that peptides from a protein but at one are at the The of reported protein matches that are the reported of are protein of the matches were reported on the of one peptide of the multiple peptides the protein Protein Prospector but with reliability Mascot the matches of a peptide Protein Prospector matches Mascot with the number of the protein was in Protein Prospector to the with one peptide in the protein then at both Protein Prospector Mascot in of and has a of The results for the database demonstrate that the discriminant scoring does an at protein identifications but that protein are made in comparison to It also that a large number of are The large of in protein in in protein not to of these protein we are protein It is within the of that protein identifications on the of one peptide are Aebersold R. M.A. A.L. K. Nesvizhskii A.I. The need for in of peptide and protein identification Protein Cell. Proteomics. 2004; 3: Scholar). In the results from this dataset are protein identifications reported the by Protein Prospector on the of a peptide the reported protein identifications from Protein Prospector assignment is on the of a peptide Thus, we that from these which is an approach used by to reliability of data protein identification using in mass Cell. Proteomics. Scholar, peptide identification in by of mass S. A. 2004; Scholar), in this a of protein results. it would also protein of the the and the database would and high the and and from the protein database at the of the reported from the are and of are the Thus, these numbers with the in the database at the of Protein Prospector this that peptide protein would produce results. at the in the protein for of these can be at the in Protein Prospector are of the were by spectra for which the was not in for one the is a peptide by a and for the the software the as the mass Protein Prospector a peptide to the This that the matches are not an a better rather the is not an has a one can which one in the results The results can be reported as a web or it can be as a that allows into or It also be in that is able to quantitative analysis of M.A. A.L. A protein algorithm within Protein in of the of the for Mass for Mass Scholar). The of the results also the of the discriminant scores one to well the scoring and It is from these that the that an is performance of this discriminant scoring system is on a large number of data such that it can model the of and the discriminant score for a dataset for comparison. This dataset was as of a analysis of in the of one of a of with R. A.L. The in in the of by J. 2004; Scholar). This also on a QSTAR MSMS In this dataset a discriminant score of to a for peptide and using this peptides from were The discriminant scoring the from the are results contains large numbers of peptides that do not have as well as containing The of the can be but the for the is to model such that the for be In this of it be to quote a of an This also be analyzing datasets of the of discriminant scores can a estimate of the reliability of a of discriminant scores for a datasets within the are peptide Protein Prospector the to a and the and for and from the for the a of the containing the is new software within Protein Prospector that allows the analysis of large It can analyze multiple in one analysis and of the results. performance has been compared with the current search engine for analysis, and the results that using a for the protein results a peptide at Protein Prospector is able to Mascot and at a a of Prospector matches at a the performance of a search engine it is to a dataset with which to its approach has been to of of and then to one of these is and are A.I. Keller A. Kolker E. Aebersold R. A statistical model for by tandem mass Chem. 2003; Scholar). this is not of the of sample that is analyzed in multidimensional chromatography of be in a Hence the approach we was to a dataset from an within the that of a of and then analyze the data to for these results with by search engines. this analysis we of the of a and it is to and based on Hence peptide results these are to be as and also have the and at the mass accuracy that these data was the search engine is not to be able to these although one the data by at the mass then mass accuracy of data is to the not to of these both as This was not a major this dataset is of were at the of of spectra is to In this dataset are a number of spectra that are and we as to be able to a confident for of these spectra the search engines that a in the results. of these be but we were not the spectra were in a to an results to the of the and reliability of peptide protein In the protein results reported we of the reported are of these are the the results this also a of the this of protein on the of peptide is recently published datasets of protein identifications were peptide identifications K. S. and in by shotgun Chem. 2004; Scholar). This a for as to to to these peptide protein we in dataset of are Although the performance of the new Protein Prospector scoring is are of the ability of the The Batch Tag scoring of for a are for the of analyzing the in this and large datasets on a instrument it be to the scoring that is the for the discriminant score. The for be instrument have a dataset of the sample analyzed in this study that was on a instrument results that by using a of for the discriminant scoring well on this The majority of the large datasets published from multidimensional chromatography of have been on The high of spectra by Prospector in this study is in to most published dataset of high data and of the spectra be (2Peng J. Elias J.E. Thoreen C.C. Licklider L.J. Gygi S.P. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome.J. Proteome. Res. 2003; 2: 43-50Google Scholar, 3Von Haller P.D. Yi E. Donohoe S. Vaughn K. Keller A. Nesvizhskii A.I. Eng J. Li X.J. Goodlett D.R. Aebersold R. Watts J.D. The application of new software tools to quantitative protein profiling via isotope-coded affinity tag (ICAT) and tandem mass spectrometry: II. Evaluation of tandem mass spectrometry methodologies for large-scale protein analysis, and the application of statistical tools for data analysis and interpretation.Mol. Cell. Proteomics. 2003; 2: 428-442Google Scholar, statistical for tandem mass Chem. 2003; Scholar), although one study has reported identification K. S. and in by shotgun Chem. 2004; Scholar). This is to be a of the reliability of results with search engines but rather a of the quality of data on a instrument in comparison to that on an both in of mass accuracy and the of a mass in the the of for the number of for yeast has a well with relatively compared with spectra in a multidimensional analysis are on a QSTAR instrument. the number of interpretable spectra from the be the number of spectra does not produce this is information for the at large to the of and the use of for data to the that are new to the and are that that of spectra in proteomic have no in the database A approach to the of tandem mass to peptides of and 2003; Scholar). this database that mass spectrometers can produce high quality data from which high matches can be made from the majority of the data. have a new of software tools that analysis of large performance has been to be to not better the current for data, In the we to these new software tools available to the the of and of the in the of these data. with files


Related Papers

No related papers found

Powered by citation graph analysis