Elucidation of Gene-to-Gene and Metabolite-to-Gene Networks inArabidopsis by Integration of Metabolomics andTranscriptomicsMasami Yokota Hirai, Marion Klein, Yuuta Fujikawa et al.|Journal of Biological Chemistry|2005 Since the completion of genome sequences of model organisms, functionalidentification of unknown genes has become a principal challenge in biology.Post-genomics sciences such as transcriptomics, proteomics, and metabolomicsare expected to discover gene functions. This report outlines the elucidationof gene-to-gene and metabolite-to-gene networks via integration ofmetabolomics with transcriptomics and presents a strategy for theidentification of novel gene functions. Metabolomics and transcriptomics dataof Arabidopsis grown under sulfur deficiency were combined andanalyzed by batch-learning self-organizing mapping. A group ofmetabolites/genes regulated by the same mechanism clustered together. Themetabolism of glucosinolates was shown to be coordinately regulated. Threeuncharacterized putative sulfotransferase genes clustering together with knownglucosinolate biosynthesis genes were candidates for involvement inbiosynthesis. In vitro enzymatic assays of the recombinant geneproducts confirmed their functions as desulfoglucosinolate sulfotransferases.Several genes involved in sulfur assimilation clustered withO-acetylserine, which is considered a positive regulator of thesegenes. The genes involved in anthocyanin biosynthesis clustered with the geneencoding a transcriptional factor that up-regulates specifically anthocyaninbiosynthesis genes. These results suggested that regulatory metabolites andtranscriptional factor genes can be identified by this approach, based on theassumption that they cluster with the downstream genes they regulate. Thisstrategy is applicable not only to plant but also to other organisms forfunctional elucidation of unknown genes. Since the completion of genome sequences of model organisms, functionalidentification of unknown genes has become a principal challenge in biology.Post-genomics sciences such as transcriptomics, proteomics, and metabolomicsare expected to discover gene functions. This report outlines the elucidationof gene-to-gene and metabolite-to-gene networks via integration ofmetabolomics with transcriptomics and presents a strategy for theidentification of novel gene functions. Metabolomics and transcriptomics dataof Arabidopsis grown under sulfur deficiency were combined andanalyzed by batch-learning self-organizing mapping. A group ofmetabolites/genes regulated by the same mechanism clustered together. Themetabolism of glucosinolates was shown to be coordinately regulated. Threeuncharacterized putative sulfotransferase genes clustering together with knownglucosinolate biosynthesis genes were candidates for involvement inbiosynthesis. In vitro enzymatic assays of the recombinant geneproducts confirmed their functions as desulfoglucosinolate sulfotransferases.Several genes involved in sulfur assimilation clustered withO-acetylserine, which is considered a positive regulator of thesegenes. The genes involved in anthocyanin biosynthesis clustered with the geneencoding a transcriptional factor that up-regulates specifically anthocyaninbiosynthesis genes. These results suggested that regulatory metabolites andtranscriptional factor genes can be identified by this approach, based on theassumption that they cluster with the downstream genes they regulate. Thisstrategy is applicable not only to plant but also to other organisms forfunctional elucidation of unknown genes. In the era of post-genomics, a systematic and comprehensive understandingof the complex events of life is a great concern in biology. The first step inthis process is to identify all gene functions and gene-to-gene networks asthe components of the system, the whole events of life. The metabolome is thefinal product of a series of gene actions. Hence, metabolomics has a potentialto elucidate gene functions and networks, especially when integrated withtranscriptomics. A promising approach is pair-wise transcript-metabolitecorrelation analysis, which can reveal unexpected correlations and bring tolight candidate genes for modifying the metabolite content(1Urbanczyk-Wochniak E. Luedemann A. Kopka J. Selbig J. Roessner-Tunali U. Willmitzer L. Fernie A.R. EMBO Rep. 2003; 4: 989-993Crossref PubMed Scopus (270) Google Scholar). Gene functions involved inthe specific pathway of interest have been identified by the integration oftranscript and targeted metabolic profiling in experimental systems where thepathway was activated(2Aharoni A. Keizer L.C. Bouwmeester H.J. Sun Z. Alvarez-Huerta M. Verhoeven H.A. Blaas J. van Houwelingen A.M. DeVos R.C. van der Voet H. Jansen R.C. Guis M. Mol J. Davis R.W. Schena M. van Tunen A.J. O'Connell A.P. Plant Cell. 2000; 12: 647-662Crossref PubMed Scopus (450) Google Scholar, 3Guterman I. Shalit M. Menda N. Piestun D. Dafny-Yelin M. Shalev G. Bar E. Davydov O. Ovadis M. Emanuel M. Wang J. Adam Z. Pichersky E. Lewinsohn E. Zamir D. Vainstein A. Weiss D. Plant Cell. 2002; 14: 2325-2538Crossref PubMed Scopus (256) Google Scholar, 4Goossens A. Hakkinen S.T. Laakso I. Seppanen-Laakso T. Biondi S. De Sutter V. Lammertyn F. Nuutila A.M. Soderlund H. Zabeau M. Inze D. Oksman-Caldentey K.M. Proc. Natl. Acad. Sci. U. S. A. 2003; 100: 8595-8600Crossref PubMed Scopus (333) Google Scholar, 5Mercke P. Kappers I.F. Verstappen F.W. Vorst O. Dicke M. Bouwmeester H.J. PlantPhysiol. 2004; 135: 2012-2024Google Scholar, 6Tohge T. Nishiyama Y. Hirai M.Y. Yano M. Nakajima J. Awazuhara M. Inoue E. Takahashi H. Goodenowe D.B. Kitayama M. Noji M. Yamazaki M. Saito K. PlantJ. 2005; 42: 218-235Crossref PubMed Scopus (762) Google Scholar).However, up to now, no gene function has been identified by non-targetedanalyses of the transcriptome and metabolome. In this report, we analyzed thenon-targeted metabolome and transcriptome of a model plantArabidopsis under sulfur(S) 1The abbreviations used are: S, sulfur; BL-SOM,batch-learning-self-organizing mapping; OAS,O-acetyl-l-serine; GLS(s), glucosinolate(s); PAPS,3′-phosphoadenosine-5′-phosphosulfate; GST, glutathioneS-transferase; APS, ATP sulfurylase; Serat, serineacetyltransferase. deficiency whosegenome sequencing has been completed. Our strategy for integrated analysesusing batch-learning-self-organizing mapping (BL-SOM)(7Kanaya S. Kinouchi M. Abe T. Kudo Y. Yamada Y. Nishi T. Mori H. Ikemura T. Gene. 2001; 276: 89-99Crossref PubMed Scopus (155) Google Scholar, 8Abe T. Kanaya S. Kinouchi M. Ichiba Y. Kozuki T. Ikemura T. Genome Res. 2003; 13: 693-702Crossref PubMed Scopus (205) Google Scholar, 9Hirai M.Y. Yano M. Goodenowe D.B. Kanaya S. Kimura T. Awazuhara M. Arita M. Fujiwara T. Saito K. Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 10205-10210Crossref PubMed Scopus (627) Google Scholar)enabled the identification of gene-to-gene and metabolite-to-gene networks andnew gene functions. Plant Materials—Arabidopsis thaliana ecotype Columbia wasgrown for 21 days on agar-solidified S-sufficient medium following themethodology of Ref. 10Hirai M.Y. Fujiwara T. Awazuhara M. Kimura T. Noji M. Saito K. Plant J. 2003; 33: 651-663Crossref PubMed Scopus (263) Google Scholar. Plantswere transferred to S-sufficient or S-deprived medium and grown for up to 1week (168 h). Rosette leaves and roots were harvested at 3, 6, 12, 24, 48, and168 h after transfer, immediately frozen with liquid nitrogen, and stored at–80 °C until use. Metabolome Analyses—Fourier-transform ion cyclotronresonance mass spectrometry analysis was used to conduct non-targetedmetabolomic profiling (6Tohge T. Nishiyama Y. Hirai M.Y. Yano M. Nakajima J. Awazuhara M. Inoue E. Takahashi H. Goodenowe D.B. Kitayama M. Noji M. Yamazaki M. Saito K. PlantJ. 2005; 42: 218-235Crossref PubMed Scopus (762) Google Scholar).Targeted metabolic profiling of amino acids,O-acetyl-l-serine (OAS), anions, organic acids, and sugarswas performed by high performance liquid chromatography and capillaryelectrophoresis as described(9Hirai M.Y. Yano M. Goodenowe D.B. Kanaya S. Kimura T. Awazuhara M. Arita M. Fujiwara T. Saito K. Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 10205-10210Crossref PubMed Scopus (627) Google Scholar). Extraction of metaboliteswas conducted in triplicate from each sample. Among 2,123 metabolites detectedby targeted and non-targeted analyses, 84 metabolites whose coefficient ofvariation was greater than 0.9 were eliminated. For each metabolite thelogarithm of the ratio of the average signal intensity of S-starved samples tothat of the control samples was calculated. Transcriptome Analyses—The transcriptomes were analyzedusing the Agilent Arabidopsis 2 microarray (Agilent Technologies, Palo Alto,CA), which carries 21,500 Arabidopsis genes, according to themanufacturer's specifications. Data were acquired using Agilent FeatureExtraction software. Normalization of log ratio of expression intensitybetween S-starved and control samples was carried out based on MA plot(11Dudoit S. Fridlyand J. Speed T.P. J. Am. Stat. Assoc. 2002; 97: 77-87Crossref Scopus (1926) Google Scholar,12Quackenbush J. Nat.Genet. 2002; 32: 496-501Crossref PubMed Scopus (1512) Google Scholar). Initially, log ratioMi [=log(Ri/Gi)] andaverage of logarithmic intensity Ai[=(logRi + log Gi)/2] were calculatedfor ith gene. Here, Ri and Giare differences between mean signal and mean background intensities for Cy5dye (S-starved sample) and for Cy3 dye (control sample), respectively,obtained by Agilent Feature Extraction software. Normalized log ratioMi″ was estimated as the difference betweenMi and baseline Mi′. Here, usinga relation between Mi and Ai,(Mi = f(Ai) +ϵI, ϵi was thedifference between Mi andf(Ai) for gene i) by MA plot; thebaseline for ith gene was estimated by MI′= f(Ai). The genes whose signal intensity wasregarded as zero were eliminated in the present analysis. BL-SOM Analyses—BL-SOM analyses were conducted according toRef. 9Hirai M.Y. Yano M. Goodenowe D.B. Kanaya S. Kimura T. Awazuhara M. Arita M. Fujiwara T. Saito K. Proc. Natl. Acad. Sci. U. S. A. 2004; 101: 10205-10210Crossref PubMed Scopus (627) Google Scholar. The metabolites andtranscripts whose maximum log ratio through time course was <0.1 andminimum log ratio was >–0.1 were eliminated. For ∼1,000metabolites and ∼10,000 transcripts left after the elimination, the sum ofthe square of the 6 log ratio values at 6 time points was set equal to 1 togive relative log ratio values, and all data were combined into a singlematrix to be subjected to BL-SOM. ∼1,000 metabolites and ∼10,000 geneswere classified into 40 × 29 cells in the lattice formed by BL-SOM basedon the time-dependent pattern of accumulation/expression in leaves in responseto –S (Fig. 1A).In the same way, ∼1,000 metabolites and ∼10,000 genes were classifiedinto 40 × 24 cells in the lattice based on the time-dependent pattern ofaccumulation/expression in roots in response to –S(Fig. 1B). Each cellcontained ∼10 metabolites and/or genes on the average. Each cell wascolored according to the relative log ratio values of metabolites/genes in it.When all of the relative log ratio values of metabolites/genes in the cellwere greater or smaller than the average, the cell was colored in pink or paleblue, respectively. Red and blue indicated that at least one of the relativelog ratio values was greater than the average + S.D. or smaller than theaverage–S.D., respectively. DNA Cloning Techniques—The gene encoding sulfotransferase 18(At1g74090) from A. thaliana (AtSOT18) (for nomenclature,see Ref. 13Klein M. Papenbrock J. J.Exp. Bot. 2004; 55: 1809-1820Crossref PubMed Scopus (110) Google Scholar) does not containany introns. A 1053-bp cDNA encoding AtSOT18 was amplified from theλEMBL3 genomic library of A. thaliana ecotype Columbia(Clontech) with primer 5′-GGA TCC GAA TCA GAA ACC CTA-3′ extendedby a BamHI restriction site and primer 5′-AAG CTT TTT ACC ATG TTC AAGC-3′ extended by a HindIII restriction site. The PCR contained 0.2mm dNTPs (Roth, Karlsruhe, Germany), 0.4 μm eachprimer (MWG, Ebersberg, Germany), 1 mm MgCl2, 0.75 μlof Red Taq DNA polymerase (Sigma), and 1 μg of template DNA in afinal volume of 50 μl. Before starting the first PCR cycle, the DNA wasdenatured for 180 s at 94 °C followed by 28 PCR cycles conducted for 60 sat 94 °C, 60 s at 48 °C, and 60 s at 72 °C. The process wasfinished with an elongation phase of 420 s at 72 °C. The amplified PCRfragment was ligated into the expression vector pQE-30 (Qiagen, Hilden,Germany), and the vector was introduced into the Escherichia colistrain XL1-blue. Expression and Purification of the AtSOT18 Protein in E.coli—The recombinant protein was expressed according to thefollowing protocol. After growth of the E. coli culture at 37 °Cto an A600 of 0.6 in Luria Bertani medium containingampicillin (100 μg ml–1) (AppliChem, Darmstadt, Germany),induction was carried out for 2 h at 30 °C with 1 mm (finalconcentration) isopropyl-β-d-1-thiogalactopyranoside(AppliChem). Cell lysis was carried out by adding lysozyme (finalconcentration 1 mg ml–1) (Roth) and vigorously homogenizingusing a glass homogenizer and a pestle. The recombinant AtSOT18 protein waspurified under non-denaturing conditions by affinity chromatography with thenickel affinity resin according to the manufacturer's instructions (Qiagen),dialyzed overnight, and used for enzyme activity measurements. The purity ofthe protein preparation was checked by SDS-polyacrylamide gel electrophoresisand subsequent Coomassie brilliant blue and silver staining. Substrate Preparation and Enzyme Activity Measurement—Thedesulfo form of the intact allyl GLS, sinigrin (Sigma), was prepared accordingto Graser et al. (14Graser G. Schneider B. Oldham N.J. Gershenzon J. Arch. Biochem. Biophys. 2000; 378: 411-419Crossref PubMed Scopus (92) Google Scholar,15Graser G. Oldham N.J. Brown P.D. Temp U. Gershenzon J. Phytochemistry. 2001; 57: 23-32Crossref PubMed Scopus (81) Google Scholar). The enzyme assay withrecombinant AtSOT18 protein was set up in the manner of Glendening and Poulton(16Glendening T.M. Poulton J.E. Plant Physiol. 1990; 94: 811-818Crossref PubMed Scopus (34) Google Scholar). 15 μg of purifiedrecombinant protein was used in each reaction. The 300-μl assays contained:83 mm Tris/HCl, pH 9.0, 9.2 mm MgCl2, and 58μm 3′-phosphoadenosine-5′-phosphosulfate (PAPS;Calbiochem), and 6.2 mm desulfo-allyl GLS. The reaction was startedby the addition of PAPS, incubated for 30 min at 30 °C, and stopped byincubation for 20 min at 95 °C. The formation of intact allyl GLS wasanalyzed by high performance liquid chromatography according to Mellon etal. (17Mellon F.A. Bennett R.N. Holst B. Williamson G. Anal. Biochem. 2002; 306: 83-91Crossref PubMed Scopus (105) Google Scholar). Separation wasachieved with a gradient of 0.1% trifluoroacetic acid and acetonitrile with0.1% trifluoroacetic acid on an RP-C18 column (Supelcosil LC18-DB, 250 ×4.6 mm, 5 μm). The product was identified by its retention time, absorptionspectrum, and mass spectrum as described by Mellon et al.(17Mellon F.A. Bennett R.N. Holst B. Williamson G. Anal. Biochem. 2002; 306: 83-91Crossref PubMed Scopus (105) Google Scholar). Quantification of theproduct formation was done with a calibration curve prepared using authenticsinigrin. Time-dependent changes in the metabolome and the transcriptome wereanalyzed in a non-targeted way after Arabidopsis plants weresubjected to S deprivation for up to 168 h. ∼2,000 metabolites detected bytargeted and non-targeted analyses and 21,500 transcripts by DNA microarraywere used for calculation, and each was expressed as a vector insix-dimensional space, where six components of the vector were six log ratiovalues of signal intensities under S deficiency compared with those undercontrol condition at six time points. For integrated analysis of metabolome and transcriptome, we used BL-SOM,which classified the metabolites and the transcripts according to theirtime-dependent pattern of changes in accumulation and expression. BL-SOM is asophisticated form of multivariate analyses that can classify metabolites andtranscripts into cells on a two-dimensional lattice; those showing similarpatterns are clustered into the same or the neighboring cells. Afterelimination of the metabolites and the transcripts exhibiting no the sum of the square of 6 values was set to This to classify and the transcripts by the of the time-dependent and not by the of the of to the metabolites and the left were combined into a to be subjected to BL-SOM. are shown in changes in and of metabolites and transcripts with correlations in the and which are the of clustered into the cells in the specific GLS is coordinately A are and in the and are as to and and as S U. Gershenzon J. Plant 2002; Scopus Google U. Plant Sci. 2002; PubMed Scopus Google Scholar). are such as and The pathway of GLS has and the Arabidopsis genes encoding of the been identified or at least for those encoding 2002; PubMed Scopus Google Scholar, P. Plant J. 2004; PubMed Scopus Google Scholar, B. G. M. J. G. Plant Physiol. 2004; 135: PubMed Scopus Google of GLS biosynthesis and by PAPS, In and the of but the of is by a and of and the of the followed and The genes encoding the GLS biosynthesis those and and into the cells in the same is that putative and by and M. Papenbrock J. J.Exp. Bot. 2004; 55: 1809-1820Crossref PubMed Scopus (110) Google Scholar) were classified into where the GLS genes were This suggested that genes are also involved in GLS of putative sulfotransferase gene in assays were conducted using the The recombinant AtSOT18 protein desulfo-allyl GLS allyl GLS in the of of the product was confirmed by mass spectrometry not activity of the gene of the other putative and has been analyzed in the results with desulfo-allyl GLS not confirmed that the genes, which GLS biosynthesis genes after BL-SOM analysis, of to process of this et M. A. A. A. T. 2004; PubMed Scopus Google Scholar) the by strategy to and that was also the which was to involved in E. S. N. 2001; PubMed Scopus Google Scholar). This 2 was GLS biosynthesis genes, the by al. E. S. N. 2001; PubMed Scopus Google the function of this gene was confirmed by etal. J. S. PlantJ. 2004; PubMed Scopus Google Scholar). These that strategy is to elucidate gene functions in the pathway at In a way, based on the results of BL-SOM the functions of genes for GLS biosynthesis and were also A putative gene and putative genes and were together with the GLS biosynthesis genes. The gene a in GLS The whose gene product of the GLS P. Plant J. 2004; PubMed Scopus Google also been as a T. Awazuhara M. Saito K. J. 2003; PubMed Scopus (34) Google Scholar). The not GLS, at least under of of GLS P. Plant J. 2004; PubMed Scopus Google Scholar). this a of P. Plant J. 2004; PubMed Scopus Google and the product also function as in GLS biosynthesis under or for the putative genes clustering with the other GLS has been suggested that be involved in complex formed by and The after and (Fig. is vitro to form a Hence, metabolic by is in to P. Plant J. 2004; PubMed Scopus Google Scholar). The genes and be candidates an are the genes clustered with for identification of and under S is regulated by of and of enzymatic K. PlantPhysiol. 2004; Scholar). For of are regulated by in an by and by formation of complex with The involved in S are in Arabidopsis These genes were on the which with the that S is regulated not only accumulation but also by enzymatic K. PlantPhysiol. 2004; Scholar). Among genes were clustered together with and genes are to be by whose under is considered regulator of this K. PlantPhysiol. 2004; Scholar). The and genes to into genes and were clustered same (Fig. the Among the ATP and genes, and were clustered with and their under S deficiency S assimilation genes, the 1 gene gene are to be under S deficiency and by M.Y. Fujiwara T. Awazuhara M. Kimura T. Noji M. Saito K. Plant J. 2003; 33: 651-663Crossref PubMed Scopus (263) Google V. J. S. M. H. Plant J. 2003; 33: PubMed Scopus Google A. A. P. M. Plant J. 2002; PubMed Scopus Google Scholar). were also in where clustered not putative which are to be by S A. Inoue E. A. T. Takahashi H. PlantPhysiol. 2003; were also in the that they are coordinately regulated with the specific function of each of a gene by this analysis. The ATP of this gene were clustered with genes and that be for the biosynthesis of for the other as with be involved in the of under and is a factor that Y. J. Plant Cell. 2000; 12: PubMed Scopus Google Scholar). In an of the gene that the biosynthesis genes of such as and were Y. J. Plant Cell. 2000; 12: PubMed Scopus Google Scholar). that inthis anthocyanin biosynthesis genes were specifically biosynthesis genes, that is a of anthocyanin biosynthesis T. Nishiyama Y. Hirai M.Y. Yano M. Nakajima J. Awazuhara M. Inoue E. Takahashi H. Goodenowe D.B. Kitayama M. Noji M. Yamazaki M. Saito K. PlantJ. 2005; 42: 218-235Crossref PubMed Scopus (762) Google Scholar). In this and the anthocyanin biosynthesis genes were clustered that and genes regulated by can be by subjected to S deficiency not biosynthesis was not and the anthocyanin biosynthesis genes were clustered that a regulatory can be by when we not on a specific For GLS putative factor genes with GLS biosynthesis genes not These genes are genes GLS In the present we gene-to-gene and and identify a gene function through integrated analysis ofmetabolomics and Our strategy be especially where the gene of interest is and no are in of the O. Kopka J. Saito K. J. P. Roessner-Tunali U. R.N. Plant Sci. 2004; PubMed Scopus Google K. O. Proc. Natl. Acad. Sci. U. S. A. 2004; 101: PubMed Scopus Google Scholar). This approach applicable not only to plants but also to other organisms, and for the identification of novel gene functions. for with with