Analyzing genomes with cumulative skew diagramsAndrey Grigoriev|Nucleic Acids Research|1998 A novel method of cumulative diagrams shows that the nucleotide composition of a microbial chromosome changes at two points separated by about a half of its length. These points coincide with sites of replication origin and terminus for all bacteria where such sites are known. The leading strand is found to contain more guanine than cytosine residues. This fact is used to predict origin and terminus locations in other bacterial and archaeal genomes. Local changes, visible as diagram distortions, may represent recent genome rearrangements, as demonstrated for two strains of Escherichia coli . Analysis of the diagrams of viral and mitochondrial genomes suggests a link between the base composition bias and the time spent by DNA in a single stranded state during replication.
A relationship between gene expression and protein interactions on the proteome scale: analysis of the bacteriophage T7 and the yeast Saccharomyces cerevisiaeAndrey Grigoriev|Nucleic Acids Research|2001 The relationship between the similarity of expression patterns for a pair of genes and interaction of the proteins they encode is demonstrated both for the simple genome of the bacteriophage T7 and the considerably more complex genome of the yeast Saccharomyces cerevisiae. Statistical analysis of large-scale gene expression and protein interaction data shows that protein pairs encoded by co-expressed genes interact with each other more frequently than with random proteins. Furthermore, the mean similarity of expression profiles is significantly higher for respective interacting protein pairs than for random ones. Such coupled analysis of gene expression and protein interaction data may allow evaluation of the results of large-scale gene expression and protein interaction screens as demonstrated for several publicly available datasets. The role of this link between expression and interaction in the evolution from monomeric to oligomeric protein structures is also discussed.
Structural variants in 3000 rice genomesInvestigation of large structural variants (SVs) is a challenging yet important task in understanding trait differences in highly repetitive genomes. Combining different bioinformatic approaches for SV detection, we analyzed whole-genome sequencing data from 3000 rice genomes and identified 63 million individual SV calls that grouped into 1.5 million allelic variants. We found enrichment of long SVs in promoters and an excess of shorter variants in 5' UTRs. Across the rice genomes, we identified regions of high SV frequency enriched in stress response genes. We demonstrated how SVs may help in finding causative variants in genome-wide association analysis. These new insights into rice genome biology are valuable for understanding the effects SVs have on gene function, with the prospect of identifying novel agronomically important alleles that can be utilized to improve cultivated rice.
On the number of protein-protein interactions in the yeast proteomeAndrey Grigoriev|Nucleic Acids Research|2003 Using two different approaches, we estimated that on average there are about five interacting partners per protein in the proteome of the yeast Saccharomyces cerevisiae. In the first approach, we used a novel method to model sampling overlap by a Bernoulli process, compared the results of two independent yeast two-hybrid interaction screens and tested the robustness of the estimate. The most stable estimate of five interactors per protein was obtained when the three most highly connected nodes in the protein interaction network were removed from the analysis (eight interactors per protein if those nodes were kept). In the second approach, we analysed a published high-confidence subset of putative interaction data obtained from multiple sources, including large-scale two-hybrid screens, complex purifications, synthetic lethals, correlated gene expression, computational predictions and previous annotations. Strikingly, the estimate was again five interactors per protein. These estimates suggest a range of approximately 16,000-26,000 different interaction pairs in the yeast, excluding homotypic interactions. We also discuss the approaches to estimating the rate of homotypic interactions.
Protein domains correlate strongly with exons in multiple eukaryotic genomes – evidence of exon shuffling?Mingyi Liu, Andrey Grigoriev|Trends in Genetics|2004