Conserved sequence motifs in the initiator proteins for rolling circle DNA replication encoded by diverse replicons from eubacteria, eucaryotes and archaebacteriaAn amino acid motif was identified that consists of the sequence HisHydrHisHydrHydrHydr (Hydr--bulky hydrophobic residue) and is conserved in two vast classes of proteins, one of which is involved in initiation and termination of rolling circle DNA replication, or RCR (Rep proteins), and the other in mobilization (conjugal transfer) of plasmid DNA (Mob proteins). Based on analogies with metalloenzymes, it is hypothesized that the two conserved His residues in this motif may be involved in metal ion coordination required for the activity of the Rep and Mob proteins. Rep proteins contained two additional conserved motifs, one of which was located upstream, and the other downstream from the 'two His' motif. The C-terminal motif encompassed the Tyr residue(s) forming the covalent link with nicked DNA. Mob proteins were characterized by the opposite orientation of the conserved motifs, with the (putative) DNA-linking Tyr being located near their N-termini. Both Rep and Mob protein classes further split into several distinct families. Although it was not possible to find a motif or pattern that would be unique for the entire Rep or Mob class, unique patterns were derived for large subsets of the proteins of each class. These observations allowed the prediction of the amino acid residues involved in DNA nicking, which is required for the initiation of RCR or conjugal transfer of single-stranded (ss) DNA, in Rep and Mob proteins encoded by a number of replicons of highly diverse size, structure and origin. It is conjectured that recombination has played a major part in the dissemination of genes encoding related Rep or Mob proteins among the replicons exploiting RCR. It is speculated that the eucaryotic small ssDNA replicons encoding proteins with the conserved RCR motifs and replicating via RCR-related mechanisms, such as geminiviruses and parvoviruses, may have evolved from eubacterial replicons.
Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaeaProtein sequences encoded in three complete bacterial genomes, those of Haemophilus influenzae, Mycoplasma genitalium and Synechocystis sp., and the first available archaeal genome sequence, that of Methanococcus jannaschii, were analysed using the BLAST2 algorithm and methods for amino acid motif detection. Between 75% and 90% of the predicted proteins encoded in each of the bacterial genomes and 73% of the M. jannaschii proteins showed significant sequence similarity to proteins from other species. The fraction of bacterial and archaeal proteins containing regions conserved over long phylogenetic distances is nearly the same and close to 70%. Functions of 70-85% of the bacterial proteins and about 70% of the archaeal proteins were predicted with varying precision. This contrasts with the previous report that more than half of the archaeal proteins have no homologues and shows that, with more sensitive methods and detailed analysis of conserved motifs, archaeal genomes become as amenable to meaningful interpretation by computer as bacterial genomes. The analysis of conserved motifs resulted in the prediction of a number of previously undetected functions of bacterial and archaeal proteins and in the identification of novel protein families. In spite of the generally high conservation of protein sequences, orthologues of 25% or less of the M. jannaschii genes were detected in each individual completely sequenced genome, supporting the uniqueness of archaea as a distinct domain of life. About 53% of the M. jannaschii proteins belong to families of paralogues, a fraction similar to that in bacteria with larger genomes, such as Synechocystis sp. and Escherichia coli, but higher than that in H. influenzae, which has approximately the same number of genes as M. jannaschii. Certain groups of proteins, e.g. molecular chaperones and DNA repair enzymes, thought to be ubiquitous and represented in the minimal gene set derived by bacterial genome comparison, are missing in M. jannaschii, indicating massive non-orthologous displacement of genes responsible for essential functions. An unexpectedly large fraction of the M. jannaschii gene products, 44%, shows significantly higher similarity to bacterial than to eukaryotic proteins, compared with 13% that have eukaryotic proteins as their closest homologues (the rest of the proteins show approximately the same level of similarity to bacterial and eukaryotic homologues or have no homologues). Proteins involved in translation, transcription, replication and protein secretion are most closely related to eukaryotic proteins, whereas metabolic enzymes, metabolite uptake systems, enzymes for cell wall biosynthesis and many uncharacterized proteins appear to be 'bacterial'. A similar prevalence of proteins of apparent bacterial origin was observed among the currently available sequences from the distantly related archaeal genus, Sulfolobus. It is likely that the evolution of archaea included at least one major merger between ancestral cells from the bacterial lineage and the lineage leading to the eukaryotic nucleocytoplasm.