National Institute of Genetics
ORCID: 0000-0001-6706-0487Publishes on Genomics and Phylogenetic Studies, Metabolomics and Mass Spectrometry Studies, Microbial Metabolic Engineering and Bioproduction. 269 papers and 16.9k citations.
Add your photo, update your bio, and get notified when your ranking changes.
MassBank is the first public repository of mass spectra of small chemical compounds for life sciences (<3000 Da). The database contains 605 electron-ionization mass spectrometry (EI-MS), 137 fast atom bombardment MS and 9276 electrospray ionization (ESI)-MS(n) data of 2337 authentic compounds of metabolites, 11 545 EI-MS and 834 other-MS data of 10,286 volatile natural and synthetic compounds, and 3045 ESI-MS(2) data of 679 synthetic drugs contributed by 16 research groups (January 2010). ESI-MS(2) data were analyzed under nonstandardized, independent experimental conditions. MassBank is a distributed database. Each research group provides data from its own MassBank data servers distributed on the Internet. MassBank users can access either all of the MassBank data or a subset of the data by specifying one or more experimental conditions. In a spectral search to retrieve mass spectra similar to a query mass spectrum, the similarity score is calculated by a weighted cosine correlation in which weighting exponents on peak intensity and the mass-to-charge ratio are optimized to the ESI-MS(2) data. MassBank also provides a merged spectrum for each compound prepared by merging the analyzed ESI-MS(2) data on an identical compound under different collision-induced dissociation conditions. Data merging has significantly improved the precision of the identification of a chemical compound by 21-23% at a similarity score of 0.6. Thus, MassBank is useful for the identification of chemical compounds and the publication of experimental data.
Plant metabolism is a complex set of processes that produce a wide diversity of foods, woods, and medicines. With the genome sequences of Arabidopsis and rice in hands, postgenomics studies integrating all "omics" sciences can depict precise pictures of a whole-cellular process. Here, we present, to our knowledge, the first report of investigation for gene-to-metabolite networks regulating sulfur and nitrogen nutrition and secondary metabolism in Arabidopsis, with integration of metabolomics and transcriptomics. Transcriptome and metabolome analyses were carried out, respectively, with DNA macroarray and several chemical analytical methods, including ultra high-resolution Fourier transform-ion cyclotron MS. Mathematical analyses, including principal component analysis and batch-learning self-organizing map analysis of transcriptome and metabolome data suggested the presence of general responses to sulfur and nitrogen deficiencies. In addition, specific responses to either sulfur or nitrogen deficiency were observed in several metabolic pathways: in particular, the genes and metabolites involved in glucosinolate metabolism were shown to be coordinately modulated. Understanding such gene-to-metabolite networks in primary and secondary metabolism through integration of transcriptomics and metabolomics can lead to identification of gene function and subsequent improvement of production of useful compounds in plants.
Compound identification from accurate mass MS/MS spectra is a bottleneck for untargeted metabolomics. In this study, we propose nine rules of hydrogen rearrangement (HR) during bond cleavages in low-energy collision-induced dissociation (CID). These rules are based on the classic even-electron rule and cover heteroatoms and multistage fragmentation. We evaluated our HR rules by the statistics of MassBank MS/MS spectra in addition to enthalpy calculations, yielding three levels of computational MS/MS annotation: "resolved" (regular HR behavior following HR rules), "semiresolved" (irregular HR behavior), and "formula-assigned" (lacking structure assignment). With this nomenclature, 78.4% of a total of 18506 MS/MS fragment ions in the MassBank database and 84.8% of a total of 36370 MS/MS fragment ions in the GNPS database were (semi-) resolved by predicted bond cleavages. We also introduce the MS-FINDER software for structure elucidation. Molecular formulas of precursor ions are determined from accurate mass, isotope ratio, and product ion information. All isomer structures of the predicted formula are retrieved from metabolome databases, and MS/MS fragmentations are predicted in silico. The structures are ranked by a combined weighting score considering bond dissociation energies, mass accuracies, fragment linkages, and, most importantly, nine HR rules. The program was validated by its ability to correctly calculate molecular formulas with 98.0% accuracy for 5063 MassBank MS/MS records and to yield the correct structural isomer with 82.1% accuracy within the top-3 candidates. In a test with 936 manually identified spectra from an untargeted HILIC-QTOF MS data set of human plasma, formulas were correctly predicted in 90.4% of the cases, and the correct isomer structure was retrieved at 80.4% probability within the top-3 candidates, including for compounds that were absent in mass spectral libraries. The MS-FINDER software is freely available at http://prime.psc.riken.jp/ .