Xiaojun Li

Identification and quantification of N-linked glycoproteins using hydrazide chemistry, stable isotope labeling and mass spectrometry

Hui Zhang, Xiaojun Li, Daniel Martin et al.|Nature Biotechnology|2003

Cited by 1.4k

Automated Statistical Analysis of Protein Abundance Ratios from Data Generated by Stable-Isotope Dilution and Tandem Mass Spectrometry

Xiaojun Li, Hui Zhang, Jeffrey A. Ranish et al.|Analytical Chemistry|2003

Cited by 341

We describe an algorithm for the automated statistical analysis of protein abundance ratios (ASAPRatio) of proteins contained in two samples. Proteins are labeled with distinct stable-isotope tags and fragmented, and the tagged peptide fragments are separated by liquid chromatography (LC) and analyzed by electrospray ionization (ESI) tandem mass spectrometry (MS/MS). The algorithm utilizes the signals recorded for the different isotopic forms of peptides of identical sequence and numerical and statistical methods, such as Savitzky-Golay smoothing filters, statistics for weighted samples, and Dixon's test for outliers, to evaluate protein abundance ratios and their associated errors. The algorithm also provides a statistical assessment to distinguish proteins of significant abundance changes from a population of proteins of unchanged abundance. To evaluate its performance, two sets of LC-ESI-MS/MS data were analyzed by the ASAPRatio algorithm without human intervention, and the data were related to the expected and manually validated values. The utility of the ASAPRatio program was clearly demonstrated by its speed and the accuracy of the generated protein abundance ratios and by its capability to identify specific core components of the RNA polymerase II transcription complex within a high background of copurifying proteins.

Serum Biomarkers Identify Patients Who Will Develop Inflammatory Bowel Diseases Up to 5 Years Before Diagnosis

Joana Torres, Francesca Petralia, Takahiro Sato et al.|Gastroenterology|2020

Cited by 241

High Throughput Quantitative Analysis of Serum Proteins Using Glycopeptide Capture and Liquid Chromatography Mass Spectrometry

Hui Zhang, Eugene C. Yi, Xiaojun Li et al.|Molecular & Cellular Proteomics|2004

Cited by 208Open Access

It is expected that the composition of the serum proteome can provide valuable information about the state of the human body in health and disease and that this information can be extracted via quantitative proteomic measurements. Suitable proteomic techniques need to be sensitive, reproducible, and robust to detect potential biomarkers below the level of highly expressed proteins, generate data sets that are comparable between experiments and laboratories, and have high throughput to support statistical studies. Here we report a method for high throughput quantitative analysis of serum proteins. It consists of the selective isolation of peptides that are N-linked glycosylated in the intact protein, the analysis of these now deglycosylated peptides by liquid chromatography electrospray ionization mass spectrometry, and the comparative analysis of the resulting patterns. By focusing selectively on a few formerly N-linked glycopeptides per serum protein, the complexity of the analyte sample is significantly reduced and the sensitivity and throughput of serum proteome analysis are increased compared with the analysis of total tryptic peptides from unfractionated samples. We provide data that document the performance of the method and show that sera from untreated normal mice and genetically identical mice with carcinogen-induced skin cancer can be unambiguously discriminated using unsupervised clustering of the resulting peptide patterns. We further identify, by tandem mass spectrometry, some of the peptides that were consistently elevated in cancer mice compared with their control littermates. It is expected that the composition of the serum proteome can provide valuable information about the state of the human body in health and disease and that this information can be extracted via quantitative proteomic measurements. Suitable proteomic techniques need to be sensitive, reproducible, and robust to detect potential biomarkers below the level of highly expressed proteins, generate data sets that are comparable between experiments and laboratories, and have high throughput to support statistical studies. Here we report a method for high throughput quantitative analysis of serum proteins. It consists of the selective isolation of peptides that are N-linked glycosylated in the intact protein, the analysis of these now deglycosylated peptides by liquid chromatography electrospray ionization mass spectrometry, and the comparative analysis of the resulting patterns. By focusing selectively on a few formerly N-linked glycopeptides per serum protein, the complexity of the analyte sample is significantly reduced and the sensitivity and throughput of serum proteome analysis are increased compared with the analysis of total tryptic peptides from unfractionated samples. We provide data that document the performance of the method and show that sera from untreated normal mice and genetically identical mice with carcinogen-induced skin cancer can be unambiguously discriminated using unsupervised clustering of the resulting peptide patterns. We further identify, by tandem mass spectrometry, some of the peptides that were consistently elevated in cancer mice compared with their control littermates. There is growing interest in testing the hypothesis that the serum 1In this paper, the term serum is used to indicate serum or plasma. proteome contains protein biomarkers that are useful for classifying the physiological or pathological status of an individual. Such markers are expected to be useful for the prediction, detection, and diagnosis of disease as well as to follow the efficacy, toxicology, and side effects of drug treatment (1Wulfkuhle J.D. Liotta L.A. Petricoin E.F. Proteomic applications for the early detection of cancer..Nat. Rev. Cancer. 2003; 3: 267-275Google Scholar). The idea of reading diagnostic or prognostic signatures from human body fluids is neither new nor original. Early attempts using high resolution two-dimensional gel electrophoresis were described more than 2 decades ago (2Anderson L. Anderson N.G. High resolution two-dimensional electrophoresis of human plasma proteins..Proc. Natl. Acad. Sci. U. S. A. 1977; 74: 5421-5425Google Scholar, 3Merril C.R. Goldman D. Sedman S.A. Ebert M.H. Ultrasensitive stain for proteins in polyacrylamide gels shows regional variation in cerebrospinal fluid proteins..Science. 1981; 211: 1437-1438Google Scholar, 4Merril C.R. Switzer R.C. Van Keuren M.L. Trace polypeptides in cellular extracts and human body fluids detected by two-dimensional electrophoresis and a highly sensitive silver stain..Proc. Natl. Acad. Sci. U. S. A. 1979; 76: 4335-4339Google Scholar). Renewed interest in this idea has emerged due to recent advances in proteomic technologies (5Aebersold R. Mann M. Mass spectrometry-based proteomics..Nature. 2003; 422: 198-207Google Scholar), intriguing initial results from analyzing serum protein patterns using mass spectrometry (1Wulfkuhle J.D. Liotta L.A. Petricoin E.F. Proteomic applications for the early detection of cancer..Nat. Rev. Cancer. 2003; 3: 267-275Google Scholar), and the clinical validation and use of a number of diagnostic disease markers including CA125 for ovarian cancer, prostate-specific antigen for prostate cancer, and carcinoembryonic antigen for colon, breast, pancreatic, and lung cancer (6Diamandis E.P. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations..Mol. Cell. Proteomics. 2004; 3: 367-378Google Scholar). A number of new approaches that differ from the traditional two-dimensional gel electrophoresis method for the discovery of protein biomarkers in serum have recently been described (1Wulfkuhle J.D. Liotta L.A. Petricoin E.F. Proteomic applications for the early detection of cancer..Nat. Rev. Cancer. 2003; 3: 267-275Google Scholar). These include surface-enhanced laser desorption ionization mass spectrometry (SELDI-MS) 2The abbreviations used are: SELDI, surface-enhanced laser desorption ionization; MS, mass spectrometry; LC, liquid chromatography; ESI, electrospray ionization; MS/MS, tandem mass spectrometry; MALDI, matrix-assisted laser desorption ionization; CID, collision-induced dissociation; TOF, time-of-flight; QTOF, quadrupole time-of-flight; CV, coefficient of variance; DMBA, 7,12-dimethylbenz[a]anthracene; HPLC, high performance liquid chromatography. (7Petricoin E.F. Ardekani A.M. Hitt B.A. Levine P.J. Fusaro V.A. Steinberg S.M. Mills G.B. Simone C. Fishman D.A. Kohn E.C. Liotta L.A. Use of proteomic patterns in serum to identify ovarian cancer..Lancet. 2002; 359: 572-577Google Scholar), liquid chromatography tandem mass spectrometry (LC-MS/MS) of serum proteome digests (8Adkins J.N. Varnum S.M. Auberry K.J. Moore R.J. Angell N.H. Smith R.D. Springer D.L. Pounds J.G. Toward a human blood serum proteome: analysis by multidimensional separation coupled with mass spectrometry..Mol. Cell. Proteomics. 2002; 1: 947-955Google Scholar, 9Tirumalai R.S. Chan K.C. Prieto D.A. Issaq H.J. Conrads T.P. Veenstra T.D. Characterization of the low molecular weight human serum proteome..Mol. Cell. Proteomics. 2003; 2: 1096-1103Google Scholar, 10Shen Y. Jacobs J.M. Camp II, D.G. Fang R. Moore R.J. Smith R.D. Xiao W. Davis R.W. Tompkins R.G. Ultra-high-efficiency strong for high of the human plasma 2004; 76: Scholar), or protein separation by S. liquid in Sci. 2003; Scholar, S. approaches to the of biomarkers for 2002; Scholar), of the serum proteome on by matrix-assisted laser desorption ionization mass spectrometry D. E.C. peptide by sample and mass 2004; 76: Scholar), and and of these of the serum proteome is with the of serum samples. human blood serum is to of of of protein that a of an of Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar). the serum proteome is by a few highly proteins, the human serum proteins of total protein mass R.S. Chan K.C. Prieto D.A. Issaq H.J. Conrads T.P. Veenstra T.D. Characterization of the low molecular weight human serum proteome..Mol. Cell. Proteomics. 2003; 2: 1096-1103Google Scholar). of total serum protein mass is by protein, of the serum proteins show two-dimensional that are with the of protein Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar). protein from two-dimensional of serum were by mass spectrometry, to protein on were as of the R. C.R. A.M. M. R. Anderson S. The human serum proteome: of protein on two-dimensional electrophoresis gels and of 2003; 3: Scholar). the serum proteome in an and in a for serum proteome analysis have the to detect low quantitative to in the proteome and to detect in a of to the to identify peptides for their on and of results from and and high sample throughput to support with statistical Here we a new method for quantitative serum proteome It is on the selective isolation of peptides from serum proteins that are N-linked glycosylated in the protein and the analysis of the peptide the now deglycosylated of these peptides by and By selectively this of the a in analyte complexity a of the total number of peptides due to the that serum protein on contains a few N-linked and a of complexity by the that significantly to the peptide We on to show that this method is and increased and throughput compared with the analysis of selective analyte we in a peptide patterns the serum proteome of mice from genetically identical untreated normal mice be and peptides be this method of the and high throughput of the serum We that in serum discovery were from from from were from mice of were to the skin L.A. A. A. of or of skin 74: Scholar). were were and were with The of mice were with a of the in of were with a for to that were well of with as early as and to for the A of these to mice were and blood by with a and to for were by The untreated mice the mice as by N-linked glycosylated peptides were and using the N-linked as described R. and of N-linked using and mass 2003; Scholar). from of serum were used in isolation and of formerly N-linked and peptides from of serum were used in mass spectrometry tryptic peptides from serum proteins, proteins from of serum were in of for The proteins were with of and the proteins were The peptides were reduced by for and by for The peptides were and in from of serum of serum were used for The peptides and proteins were using analysis using an mass as described S.A. M.H. R. analysis of protein using Scholar). quantitative analysis of peptides using an mass peptides from of serum sample using the method were a peptide with using a and a with The from the a electrospray ionization in peptides were and the mass The peptide a high for ESI, and the were to of a of E.C. R. A liquid chromatography electrospray ionization of peptide tandem mass spectrometry the level on an mass with Mass 2003; Scholar). A of from a of the data were with a in the mass between and with and of the peptide mass were from data analysis The were used for analysis for with the of samples. data a of or to data for this and be C. and R. in The use data by analysis of formerly N-linked glycopeptides from serum and the to peptides that are of in cancer and normal A of peptide from The this a of a for the analysis of data A. R. proteomic analysis by mass 2003; Scholar). to the that in peptides are in were the to detected in patterns were on peptide The used to peptides with the The for in the in peptide by the the significantly the sample the high mass in the and the that highly and peptide patterns. the peptides that in of in were for further quantitative of peptides in for peptide using the method as described in the for data R. statistical analysis of protein from data by and tandem mass 2003; Scholar). the from of a peptide and and for peptide the for of the peptide were The a level in and that from the the The of peptides with their were to unsupervised clustering D. analysis and of Natl. Acad. Sci. U. S. A. to identify peptides cancer from normal samples. to the data were to and the of peptide in of the total were used for clustering The of the method is the of peptide patterns the serum to the detection of peptides that between of and the of these The method is in and consists of that N-linked in the protein were in their deglycosylated using a recently described method R. and of N-linked using and mass 2003; Scholar). peptides were by to generate and patterns. patterns from were and the peptides were peptides and the proteins from were by tandem mass spectrometry and data the of the method for serum protein serum from genetically identical were using the N-linked and the peptides were by The resulting collision-induced were the data and the data results were further using the A. R. statistical to the of peptide by and 2002; 74: Scholar). of the in peptide from the data with peptide of to a of A. R. statistical to the of peptide by and 2002; 74: The were for the of the N-linked The number of proteins by the peptides were using R. of proteins using and mass Scholar). The of peptides and proteins are in on and the of these peptides are in on The number of proteins and peptides are in A total of peptides were proteins. of the of and of proteins the N-linked number of peptide and proteins and the of that of of of peptides of proteins in a new The peptides as the N-linked can be The contains peptides that are and the contains peptides that are by the the by the statistical further the of the isolation we the of peptides the N-linked as a of the The data are in It is that the of peptides the as the of the as the number of peptide and with the data in the of peptides the as the approaches We that the peptide isolation method used has a that is than We used the data described to the in sample complexity via the A total of proteins from the serum on the complexity by protein the proteins were expected to generate an of tryptic peptides per protein peptides on the and were N-linked glycosylated peptides the proteins in this an of peptides N-linked per protein were and By the number of N-linked with the number of peptides the N-linked we that of the glycosylated peptides been an analysis of the of potential N-linked in in the data S.M. analysis of the protein of for and 2004; Scholar). these data indicate that the and from serum proteins significantly sample complexity and that the method a of the N-linked glycosylated the increased sensitivity by sample complexity to detect serum protein biomarkers of we data in this to the of human serum proteins The and Scholar, A of serum plasma for Scholar). We are of a of the protein between the human and serum the serum two-dimensional of human and are to an of the of the proteins in this between human and A. M.L. A serum two-dimensional gel to and 2004; Scholar). the proteins proteins are to be in human serum low These include and II, and for and serum of the proteins in have been in the two-dimensional that are low in serum A. M.L. A serum two-dimensional gel to and 2004; Scholar). the detection the of the peptides from these proteins were using the of the the used for peptide of the an peptide of is than the for these experiments that multidimensional serum proteins on the of be detected by of formerly N-linked of formerly N-linked glycopeptides from sera and the of their proteins in human N-linked peptide between the are by plasma to and to N-linked peptide between the are by to in a new the peptides and proteins by peptides and proteins were from The number of peptides in is low compared with the total number of peptides We used the D. E.C. R. A to and data by liquid 2004; 76: to these were due to peptide in the or the data in the is as the patterns of the peptides from serum were peptides by due to the complexity of the peptides in a analysis were for and for are with and peptides from these are with or on the of peptide as as be from the between the number of peptides from analysis and total peptides in a sample from a of the high peptides from were selectively by the between by between were by the that a of total peptides by analysis in the of peptides in in of the are by these results that of glycopeptides from genetically identical mice are using due to peptide results in a number of peptide and a of the We the of the peptide patterns by from a serum sample were in to generate and by to we of and the sample by using a were used to detect in the resulting to and to peptide between patterns. C. and R. in these we the of and coefficient of for A of from the of identical by is in in The and in the of the sample were and We glycopeptides from the as described to with to peptide data is in in The and for the sample were and and comparable to the from analysis of identical samples. These indicate that sample significantly to the of peptide patterns. the hypothesis that the serum proteome from in physiological can be we the method to serum from mice in skin been and from normal untreated littermates. were in a well skin via treatment of the skin with a of by with the L.A. A. A. of or of skin 74: Scholar). treatment to that are and well of the from a M. M. S. A. from and can as of skin Scholar, M. M. A. and of skin Scholar). a of a of these to the sera of mice and and untreated normal mice and from the glycopeptides were and by as described The sample from by and a total of patterns were peptide from peptide were to in of the from normal or The patterns of the peptide between the and their were to unsupervised clustering D. analysis and of Natl. Acad. Sci. U. S. A. Scholar). nor about the of normal The results of this unsupervised clustering analysis are by the A of the peptides a of peptides is in in a peptide and a serum The of the are to the of the peptide patterns. this is that the mice and were and from the patterns from their and normal mice and were the serum be the we tryptic peptides from of serum sample to the and analysis were from the resulting and a number of peptide were detected as for the samples. to the unsupervised clustering of the total serum peptide patterns the cancer from the normal These results indicate that the number of proteins the the serum proteome by the is to the between serum to the clinical state of the The were further by to identify peptides that in in mice compared with untreated normal The and of these peptides were to the on a tandem mass and by and data shows a peptide of variation between consistently increased in mice and compared with normal and The of as a peptide with the the formerly N-linked from serum in is an protein is to be elevated Anderson and analysis of serum Scholar). We further the of the peptides by quantitative analysis using these the of the glycopeptides were with and the peptides were to the support their isolation R. and of N-linked using and mass 2003; Scholar). of from mice and and normal mice and were with the and and the peptides were in the sample with sample sample with sample sample with sample and sample with sample The were by The of peptides with in mice using analysis and were and the for and peptides were in the mass a for the of and a for the by analysis using and by data the peptides and proteins with elevated protein level in the detected by analysis and by The are in on The for the peptide from serum is in The increased level of this peptide in mice by with that by analysis these data indicate that the analysis of formerly N-linked glycosylated peptides detected peptides of in serum of cancer and normal mice and that the peptides be by of peptides and proteins with elevated in N-linked peptide between the are by number in on N-linked peptide between the are by in a new We a method for high throughput quantitative analysis of serum proteins using and It consists of the selective and isolation of peptides from the serum proteome that are by N-linked in the intact The of the deglycosylated of these peptides by The mass of peptides using and these peptides were by results indicate that the method is for the isolation of N-linked peptides were per protein an of per is with a tryptic peptides per protein from the of proteins. The data indicate that this reduced sample complexity in an in sensitivity compared with the analysis of serum digests using an identical for analysis of the method to the of sera from genetically identical mice that were untreated normal or The resulting peptide patterns and be via unsupervised of the peptides were further by MS/MS, and their in cancer control mice by using for the detection and validation of protein biomarkers in the serum of clinical be and to the complexity of the serum proteome and the proteomic technologies for can sample a of the the proteins Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar, W. R. and tandem mass a for the quantitative analysis of and 2004; Scholar). two-dimensional gel have about serum proteins Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar, R. C.R. A.M. M. R. Anderson S. The human serum proteome: of protein on two-dimensional electrophoresis gels and of 2003; 3: Scholar, M. R. R.S. Conrads T.P. Veenstra T.D. J.N. Pounds J.G. R. A. The human plasma proteome: a by of Cell. Proteomics. 2004; 3: Scholar). It has been that approaches have detection of low proteins due to the high of serum proteins and the of the (6Diamandis E.P. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations..Mol. Cell. Proteomics. 2004; 3: 367-378Google Scholar). the method the selective isolation of the N-linked glycosylated peptides in a in the of protein detected due to the in sample A number of to this the number of peptides per protein the method is significantly The proteins in this are to generate an of tryptic peptides per on the N-linked and can be and an of peptides N-linked per protein were By a number of N-linked per protein by and Y. M. and mass spectrometry to identify N-linked 2003; per in a in N-linked glycopeptides were from proteins using the serum protein, N-linked and is to the of total serum protein of peptides that serum peptide samples. quantitative of a that is by use of R. Anderson S. an a of the human plasma 2003; 3: Scholar), is an of the the method peptides from the of and the number of is of total protein mass in serum The and and a of an Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar). The of the of in serum proteome recently in a in a tryptic of serum by strong the plasma protein were Y. Jacobs J.M. Camp II, D.G. Fang R. Moore R.J. Smith R.D. Xiao W. Davis R.W. Tompkins R.G. Ultra-high-efficiency strong for high of the human plasma 2004; 76: Scholar). It is that an more of peptides in patterns of serum protein digests are from and protein data the of to serum proteins are by and resulting in for It has been that protein generate on the of Anderson N.G. The human plasma proteome: and diagnostic Cell. Proteomics. 2002; 1: Scholar). the of the are the complexity of the peptide The peptides by the method the and by a few peptides per protein of The of these is the of a peptide sample from the serum proteome with a of an of peptides per an of potential N-linked glycopeptides an is for the serum proteins of these potential N-linked were of these potential N-linked be S.M. analysis of the protein of for and 2004; Scholar), the peptides from be by mass spectrometry, or protein be by the protein as the the number of peptides from increased due to of protein and in the It is expected that the to an of the number of peptides digests of serum were of peptides from of serum using the we were to detect and peptide that were of in with a of peptides were due to the complexity of the sample and the that the mass to a of the the highly peptides in sample The the of the of protein using this we used for quantitative and this to the peptide in including from proteins of low the of peptide is for of the proteome per is that to the of proteins are in this is that the of proteins are in glycosylated and blood and as markers for diagnosis and Scholar), proteins that a of biomarkers serum the of peptides per protein the of the this in protein level or level markers that protein including be detected on a peptides by a potential disease markers that are due to and blood and as markers for diagnosis and Scholar). this we used the and analysis to serum from mice with skin cancer from that of littermates. this the mice with skin cancer and their untreated the and in the The a with skin cancer the The sera were by the of consistently increased or between the cancer and control in this the low number of to detect the of the method to of potential biomarkers in more human the analysis of sample to statistical validation of the The has throughput to a few a number that to generate results a M. R. J.D. M.L. M. M. Y. of biomarker for early detection of Natl. Scholar, Y. Davis Y. protein coupled with a prostate cancer from prostate and 2002; Scholar). By a to sample and by further analysis and the of a data analysis we are further the performance of the to the used and the detected in the method are molecular peptides in between and These for in a tandem mass are By the of peptides to an we have serum proteins for the is increased in with the of skin cancer in mice these proteins are of and have been to the in of cancer S. C. S. D. R. C. from cancer a protein composition Cell. Proteomics. 2002; 1: Scholar), are markers for the diagnosis of skin useful for cancer detection, or be proteins in from the of a of the to the or in the serum the detection of proteins or patterns of proteins, is that are that potential markers or signatures in and can be and the proteomic biomarker discovery to molecular signatures as the of and to between biomarkers and The of peptides in this that some of the proteins in in the skin cancer are to highly serum cancer markers in clinical use have in the (6Diamandis E.P. Mass spectrometry as a diagnostic and a cancer biomarker discovery tool: opportunities and potential limitations..Mol. Cell. Proteomics. 2004; 3: 367-378Google has that the method and by are about of from the sensitivity to detect proteins. The method has the potential to sensitivity and high performance are used a of of serum sample contains of prostate-specific an that is detected in a mass serum digests are on the the total of serum that can be to the be and the of detection be reduced compared with the prostate-specific antigen be well the detection of an further in the of detection were the method be with peptide including electrophoresis or chromatography or selectively peptides from N-linked glycosylated serum proteins has been to be a method for the analysis of the serum with the high of this the high level of serum proteome a throughput that this method be useful for the detection of proteins or protein patterns that in physiological We and for We and for with and analyzing the data from the serum samples.

A Software Suite for the Generation and Comparison of Peptide Arrays from Sets of Data Collected by Liquid Chromatography-Mass Spectrometry

Xiaojun Li, Eugene C. Yi, Christopher J. Kemp et al.|Molecular & Cellular Proteomics|2005

Cited by 189Open Access

There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics. There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar biological samples, e.g. for the analysis of cellular response to perturbations over time or for the discovery of protein biomarkers from clinical samples. Technical limitations of current proteomic platforms such as limited reproducibility and low throughput make this a challenging task. A new LC-MS-based platform is able to generate complex peptide patterns from the analysis of proteolyzed protein samples at high throughput and represents a promising approach for quantitative proteomics. A crucial component of the LC-MS approach is the accurate evaluation of the abundance of detected peptides over many samples and the identification of peptide features that can stratify samples with respect to their genetic, physiological, or environmental origins. We present here a new software suite, SpecArray, that generates a peptide versus sample array from a set of LC-MS data. A peptide array stores the relative abundance of thousands of peptide features in many samples and is in a format identical to that of a gene expression microarray. A peptide array can be subjected to an unsupervised clustering analysis to stratify samples or to a discriminant analysis to identify discriminatory peptide features. We applied the SpecArray to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. We demonstrate through these two study cases that the SpecArray software suite can serve as an effective software platform in the LC-MS approach for quantitative proteomics. The identification and quantification of the protein contents of biological samples plays a crucial role in biological and biomedical research (1Tyers M. Mann M. From genomics to proteomics.Nature. 2003; 422: 193-197Google Scholar, 2Aebersold R. Mann M. Mass spectrometry-based proteomics.Nature. 2003; 422: 198-207Google Scholar, 3Hanash S. Disease proteomics.Nature. 2003; 422: 226-232Google Scholar, 4Boguski M.S. McIntosh M.W. Biomedical informatics for proteomics.Nature. 2003; 422: 233-237Google Scholar). Due to the large dynamic range and the high complexity of most proteomes, it is very challenging to identify and accurately quantify the majority of proteins from such samples. LC-MS/MS-based methods are currently most efficient for the identification of a large number of proteins and have been widely applied in biological and biomedical research (5Washburn M.P. Wolters D. Yates III, J.R. Large-scale analysis of the yeast proteome by multidimensional protein identification technology.Nat. Biotechnol. 2001; 19: 242-247Google Scholar, 6Washburn M.P. Ulaszek R. Deciu C. Schieltz D.M. Yates III, J.R. Analysis of quantitative proteomic data generated via multidimensional protein identification technology.Anal. Chem. 2002; 74: 1650-1657Google Scholar, 7Gygi S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity Biotechnol. Scholar). with such methods can accurately quantify proteins S.P. Rist B. Gerber S.A. Turecek F. Gelb M.H. Aebersold R. Quantitative analysis of complex protein mixtures using isotope-coded affinity Biotechnol. Scholar, Aebersold R. Quantitative of proteins using isotope-coded affinity and Biotechnol. 2001; 19: Scholar, B. Mann M. by in as a and accurate approach to expression 2002; Scholar, B. M. Mann M. A to applied to Biotechnol. 2003; Scholar, D. of protein expression and S. Scholar, C. for with two of Chem. 2001; Scholar, B. S. S. S. S. S. S. M. F. protein in using Scholar). a LC-MS/MS-based quantitative proteomic samples to be are and and The peptide samples are by a multidimensional and by are by Mann M. for of large or by M. F. of proteins with Chem. and peptide are in the of for by Yates III, J.R. S. by S. Scholar, R. Mass in 2001; Scholar). are by using an such as Yates III, J.R. approach to data of peptides with in a protein Mass or D.M. protein identification by using to their a protein are by using a quantification software such as Aebersold R. Quantitative of proteins using isotope-coded affinity and Biotechnol. 2001; 19: or Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; that the relative of the to the relative abundance of The identification and quantification of proteins is by the from the peptides that with the protein Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; Scholar, Aebersold R. A for proteins by Chem. 2003; Scholar). The quantitative approach can identify and quantify to thousands of proteins from a biological sample is to analyze There is an increasing interest in the quantitative proteomic measurement of the protein contents of substantially similar samples. the discovery of protein biomarkers from clinical samples S. Disease proteomics.Nature. 2003; 422: 226-232Google Scholar, a serum analysis by multidimensional with 2002; Scholar, Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and and the measurement of the response of and to large of samples to be to to from proteome the the study of cellular response to and are for the identification of patterns of proteins that are by the B. Mann M. analysis of by quantitative Biotechnol. Scholar). these and similar high sample throughput and of the Quantitative is of limited for such large of of complex proteomic samples. peptide a of the peptides present in a sample is by the for D. Aebersold R. A to and data by Chem. Scholar). peptides are substantially similar samples are the of peptides that is in sample with increasing sample the peptides that are detected to be the high abundance peptides that generate many peptides are a it is very to quantitative low abundance proteins samples by the an to LC-MS/MS-based peptide by LC-MS been as a for quantitative Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, S. S. M. of proteins and by or Chem. 2003; Scholar, D. S. S. platform for proteomic and discovery using Scholar). The LC-MS approach is the that the of peptide in a substantially similar sample identical is to the abundance of the peptide the dynamic range of the and is at to the abundance the dynamic one the relative abundance of a peptide in samples by the samples identical LC-MS and by of the same peptide in LC-MS S. S. M. of proteins and by or Chem. 2003; Scholar, D. S. S. platform for proteomic and discovery using Scholar, D. and relative of protein mixtures by by Chem. 2002; 74: Scholar). The relative abundance is quantitative dynamic range and the The LC-MS approach for quantitative can be in the are and from samples and samples are by an using a high high or identical peptides can be with to accurate and a limited number of can be to identify a of the detected and are applied to peptide peptide relative abundance from LC-MS and identify a of peptides that stratify samples with respect to their genetic, physiological, or environmental the that the of the detected peptides is identical peptides in samples can be by their their and their and of discriminatory peptides is a to identify the of such peptides by and Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). the relative abundance of thousands of peptides and the of discriminatory this a of discriminatory proteins the present a new software suite, SpecArray, that the of the LC-MS approach for quantitative proteomics. The software suite a set of LC-MS data as and a peptide versus sample array that stores the relative abundance of thousands of peptide features in samples. The format of a peptide array is identical to that of a gene expression that peptide features gene in a peptide D. analysis and of expression S. peptide can be subjected to unsupervised clustering analyses as or to sample to discriminant analyses as or discriminant to identify peptides samples of The SpecArray software suite five software The LC-MS data in to data D. Aebersold R. A to and data by Chem. the high from the to peptide features from the peptide features and the generates a peptide array from peptide features in samples. the of time in a time time of time of two samples and the to peptides of samples. to sample or Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; peptide relative We applied the SpecArray software suite to analyze two sets of LC-MS data: one was from four repeat LC-MS analyses of the same glycopeptide sample, and another was from LC-MS analysis of serum samples of five male and five female mice. by these two study the SpecArray software suite is very for LC-MS data and can serve as an effective software platform in the LC-MS approach for quantitative proteomics. serum samples from five male and five female of the same at the of using a Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). A sample was from a male at the of using a similar that was in a and at for at peptides from of sample using the glycopeptide as Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). from of serum or sample by a as Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). was identical in sample data the format using the R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. Scholar). The software LC-MS data. The is in and the five A is generated from by using the A for the Scholar, and Scholar). A is generated from that of by and in the is as the of that are a of the are from using a of and is from the The as by and a a is from the current R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. for an format to of an LC-MS of the format are in The format for to and the The the to the range and the time range data are and such as the is can be The software peptide features from LC-MS data as in The the in are with peptide to identify peptide features from the with the most in the the software the of the by the the most is by the of a the software the peptide M.W. of and for large from Mass Scholar, M. R. M. B. of for analysis of complex Mass 2001; and the that the of a peptide be the most of the the software the by that the most is by or of a at one of the the a peptide is from the the the of the A peptide is by and the time of the The and the of the peptide are by the of the cases the of the most be such cases are the one that the of is that are to the peptide are from the peptide can be the most is The identification is for the in the a this the a of peptide features from peptide features be generated by the same peptide in features of in similar a of and the same are with and of a are A is for peptide by of the of the peptide and the in the Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; the is and the in the is features at is the are The peptide features are as the of the peptide is by and the time at the the of the the abundance of the and the at the the of the The software peptide features from samples. The can be in the features of two samples are their are identical and their are a of this peptide time is and a peptide in one sample with peptide features in another The software from peptide the the two LC-MS The is in the time versus time the two The the peptide and and the is The software the peptide and the the the and the and the of a to a for the peptide The the of a peptide a very low a of are peptide that at one peptide the one the is and are peptide have a the of and peptide with one peptide in another this the the two LC-MS analyses is the peptide from the two the software a for peptide in The is similar to that in that the of a peptide to the and the the two peptide features. The peptide are in the same as in this peptide features the two LC-MS analyses are to are for samples peptide features are two samples. features of samples are a peptide the peptide of sample the peptide as a and by the by are in the of are as and in the the the peptide stores peptide features of samples. The software generates a peptide array from a peptide of There are five in the the software a of peptide features from the peptide as to the peptide The software a to samples by and the number of for the peptide a peptide the of at one The software a to a range and a time range for peptide features. peptide features are for the peptide The software generates an peptide versus sample array from peptide features. The of the array is the of samples, the is the of peptide and an array the abundance of the peptide in the a peptide is in a sample, the array is in the peptide the software the peptide in the LC-MS data to the is The time of the in the sample is from of the in samples using the the samples to time peptide are the abundance of the is as in the and in the peptide the array is The software a Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; to peptide abundance that from in sample or LC-MS The software a of peptide features that in at of samples in the peptide a peptide in a sample, the abundance in the sample and abundance samples is The software abundance of peptide features in sample and a for the Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; to a for the abundance in the peptide array is by the the the software the abundance of peptide samples, peptide abundance in samples by the and the relative abundance of the peptide in samples. The of the software is a peptide array the relative abundance of peptide features in samples. features in the peptide array are by their time and the of the samples, a to in a peptide array or to with a very low abundance The peptide array is that it can be by the clustering D. analysis and of expression S. Scholar). the software a to The D. analysis and of expression S. was for unsupervised clustering analysis of peptide array data and samples and peptide features using clustering with The D. analysis and of expression S. was to the clustering in the software suite SpecArray generates a peptide array from a set of LC-MS data in five and is by one of the five software and We in the these five in and their features with data from four repeat LC-MS analyses of an glycopeptide sample that was from of a male Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). LC-MS data in format R. M. B. B. R. S. R. R. Aebersold R. A of data and to Biotechnol. is by the software to data D. Aebersold R. A to and data by Chem. for an of a the LC-MS approach for peptide it is crucial to high LC-MS data. is for data to the of the and of the such as low peptide or LC-MS can be by D. Aebersold R. A to and data by Chem. Scholar). LC-MS data are data are are high data are from LC-MS data in the are data by the software LC-MS of data. peptide from such data is an in the analysis of LC-MS data. The software such as the A for the Scholar, and to peptide from LC-MS data as the of this two by in was generated from LC-MS data in was from the data. is that the data most peptide the majority of Due to peptide are the in in are in to the of The of the data is that of the data: the from the four repeat LC-MS analyses was in format and in the format in the a one the software to the of LC-MS and the of data a of peptide features are from LC-MS data by the software The LC-MS approach to of peptides and to identify the it is crucial for the LC-MS approach to peptide features from LC-MS data. to that most peptide features are in from peptides as a peptide by peptide and time and to the abundance and the of the A and peptide features from data of the four repeat LC-MS peptide features from one of the analyses the peptide features are with of their most peptide features by the The number of peptide features from the four LC-MS data an of a of and a of of The in the number of peptide features an that LC-MS data are peptide features by the same peptides in features of the four LC-MS analyses can be and of peptide features in one the software to and be the of the peptide features of four LC-MS a of peptide features in the in in in and in of peptide features in the in a of peptides in the in a Aebersold R. analysis of protein abundance from data generated by and Chem. 2003; Scholar). The in the peptide the two from in in peptide and protein with in Chem. 2003; Scholar). The the that peptide are and peptides by an such as and be detected as peptide features D. Aebersold R. A to and data by Chem. Scholar). the of samples and the of the LC-MS peptide features from one LC-MS data features from samples are by the software that the abundance of the same peptide present in samples can be most LC-MS the peptide is from to to a two peptide is a to time two LC-MS analyses and peptide features their and time as the time of peptide features in two LC-MS analyses with the The peptide features and the was and the peptide features was the of the the of peptide features in the two analyses in peptide features with in with in The of most with a in the that the is The the same as that of two is that peptide features of high and high are of low or low The in are most to as by their low A peptide versus sample array is generated from peptide features by the software The peptide array the relative abundance of peptide features in samples and is the of the SpecArray software The of a peptide array from peptide features is We in a of the peptide array that was generated from the four LC-MS features in the peptide array to at two peptide features that analyses are in a peptide peptide features that in analyses be by peptide features can be a a set of peptide features for the peptide There a of peptide features in the peptide array peptide relative abundance four peptide relative abundance and peptide relative abundance of array be by peptide peptide or in low abundance peptides in analyses by or by one peptide features that are samples a peptide of the same peptide in as in the peptide one to features of the same peptide a in the peptide is in to the that features in have features in high be samples, in low be in or many samples. a it is to features The peptide features in the peptide array to peptides one in the peptide and two The of the relative abundance of the same peptide in a of a of and a of the relative abundance of the same peptide in was The of peptide relative abundance in analyses is an in the of the peptide the four repeat the data are in The a of a of and a of the LC-MS approach is able to peptide relative abundance with an of the be detected by the it to the LC-MS approach to discriminatory peptides with abundance in samples. demonstrate the of the LC-MS approach for large sample serum samples from five male and five female of the same at the of that are in the proteins in their using Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar, Aebersold R. and quantification of using and Biotechnol. 2003; Scholar). been that the sample is Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). glycopeptide samples by an identical We applied the SpecArray software suite to analyze the LC-MS data. four that generated from these data. of samples are very We identical to generate these that the of a peptide the of the The of these that the samples similar peptide The from the sample is that from samples to peptide in the The of LC-MS analyses the of to such The number of peptide features that detected from samples is in The number from to the is The a the number of peptide features and the as by of male and male in peptide features detected from samples of male from samples of female mice. is The number of peptide features detected from the serum samples is to that from the four repeat LC-MS analyses of the the is that of the four repeat The in was to biological or to of of peptide features detected from glycopeptide samples of in a new We four a peptide array from the serum data. The are in the peptide features in at two of the samples The peptide array a of peptide features. of the features samples, and of array be to the low of peptides in the serum samples and biological be to and software The is the for most in this peptide the peptide features in five of the five samples of male or five of the five samples of female or The peptide array a of peptide features. of the features samples, and of array relative abundance in this peptide array is that in the peptide the of the number of peptide features in peptide that generated from the number of peptide features that samples was the same peptide features in peptide the large of array the same was in peptide that generated from the same set of LC-MS data. the software and peptide features samples at a low at a high of peptide features that to a number of samples applied to generate a peptide array from LC-MS data of of of the samples of male and female of the samples of male and female or of the five samples of male or two of the five samples of female or of the five samples of male or two of the five samples of female of the samples of male and female of the five samples of male or two of the five samples of female mice. in a new is to the of the relative abundance of the same peptide in samples from a peptide We in the from the or peptide The samples of male a of and a of the samples of female and with the the four repeat LC-MS analyses of same sample samples of the same of it to features. from sample and sample to the been that from sample are Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and The samples of of a of and a of There was a in the of in with of mice. was to biological male and female mice. it that in peptide abundance to mice. is to this The from four peptide in are in the large in the number of peptide features peptide very similar to the of peptide We an unsupervised clustering analysis of the or peptide array using the Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). A of the clustering was in in the with respect to their female and with male mice. from four peptide in the male and female the these samples. this that in a peptide array be to two the peptides are in the samples, or and software the peptides to be detected or in the samples. repeat it is to the two origins. The as and such data in the unsupervised clustering that in the or peptide array from the with of the relative abundance in the array in a unsupervised clustering in the new clustering for a of peptide features of the same analysis was applied to peptide in female and female with male that unsupervised clustering analysis can be by biological and by and software The of unsupervised clustering analysis of peptide be with for sample A of peptide features in the or peptide array a peptide features that can male from female applied to the discriminatory of the peptide features from peptide relative abundance in the peptide The are in The five most discriminatory peptides a of and an the and the relative abundance of the most discriminatory peptide in The generated with the same that the of the peptide in samples the peptide the large in the relative abundance of the peptide of the same the relative abundance of male was that of female in this the and the most discriminatory peptides is in The two features a of and a time of is the two features e.g. by of at in for the Scholar). this peptide features that present in one A of peptides a Due to the sample and the in peptide very peptide features a in male from female mice. the peptide array to such discriminatory by their and time and can be for identification Aebersold R. throughput quantitative analysis of serum proteins using glycopeptide and Scholar). We present here a new software suite, SpecArray, that generates peptide from sets of LC-MS data. We data from four repeat LC-MS analyses of a glycopeptide sample to the features of We that the SpecArray software suite was able to from LC-MS data accurate thousands of peptide features samples. We glycopeptide and LC-MS approach to serum proteins of five male and five female mice. We applied SpecArray to peptide features that male from female mice. We through these two samples that the SpecArray software suite the analysis of LC-MS data and is a very software platform for the LC-MS approach to large protein in quantitative proteomics. The SpecArray software suite in an array format that is identical to that of a gene expression microarray. for a of the LC-MS approach and the SpecArray software suite in quantitative proteomics. new platform of quantitative can be one to the protein contents of a large number of substantially similar samples as in the of time discriminatory peptides and using to identify the the platform a most current proteomic platforms of proteins of biological or The array format it to analysis to the analysis of SpecArray the of biological from LC-MS data. There is that the LC-MS approach and the SpecArray software suite are at their and software are present in peptide sample LC-MS and data analysis is and is to that be sample sample sample M.S. McIntosh M.W. Biomedical informatics for proteomics.Nature. 2003; 422: 233-237Google Scholar). LC-MS are to in peptide time and peptide and to the of peptide features by sample increasing and We have that one a and LC-MS analysis of a samples LC-MS such as and such as and the and the reproducibility of peptide R. D. M. for peptide analysis with an sample and Chem. Scholar). such as the and in peptide S. M.W. B. B. S. and in the analysis of Scholar). these the of LC-MS data. data low abundance or peptide features be to A of peptide features be by the software methods are to low abundance peptide peptide and peptide features that are from a new methods make in peptide the of the LC-MS approach and the of peptide are to with the sample LC-MS and data The SpecArray software suite is in The current the A new for the is software by the SpecArray software suite be an and be at with

Is this you? Claim your profile.

Top publicationsby citations