Eugene V. Koonin
National Center for Biotechnology Information,
National Library of Medicine,
National Institutes of Health,
Bethesda MD 20894, USA
Abstract:
About 20 complete genome sequences of cellular life forms
bacteria, archaea and eukaryotes are currently available,
and many more are in the pipeline. Considerable comparative analysis
of these genomes has already been performed, and while even more
challenging work lies ahead, it is fair to ask at this juncture,
what is the impact of this research on biology in general. In
my opinion, comparative analysis of complete genome has already
affected our ideas of what biological evolution is to such an
extent that it is appropriate to claim a paradigm shift in evolutionary
biology.
Computer analysis of complete genomes of unicellular organisms shows that protein sequences are in general highly conserved in evolution, with at least 70% of them containing ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and in many cases, predict protein functions with reasonable confidence. Examination of the 'phylogenetic pattern' for these orthologous families shows only ~100 families, most of which include components of the translation machinery, are universally conserved in all sequenced genomes. Thus horizontal gene transfer and lineage-specific gene loss are not inconsequential evolutionary quirks but rather prevailing forces of evolution, at least in the prokaryotic world. Horizontal transfer and lineage-specific loss of entire genes are complemented by numerous intragenic recombination events that manifest in domain rearrangement at the protein level.
Examination of phylogenetic patterns for families of orthologous proteins also results in more specific conclusions some of which may have far-reaching consequences. In particular, it is now clear that the basic DNA replication machineries (that is, the replicative DNA polymerases, primases, helicases, and several other proteins) in bacteria and in archaea/eukaryotes are not orthologous and may have evolved independently. This leads to a hypotehsis that the common ancestor of all extant cellular life forms (the so-called cenancestor) did not possess a modern-type, DNA-based replication and expression system although it did encode advanced translation and transcription machineries and a considerable repertoire of metabolic enzymes. Instead of a dsDNA genome, the cenancestor might have had a mixed system of small RNA and DNA genetic elements that were interconverted via cycles of transcription and reverse transcription. This model seems to account for both universal and distinct components of the DNA replication machinery in bacteria and archaea-eukaryotes.
Some relevant recent references
Koonin, E. V., Mushegian, A. R., Galperin, M. Y., Walker, D. R., Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25: 619-637, 1997.
Tatusov, R. L., Koonin, E. V., Lipman, D. J., A genomic perspective on protein families. Science, 278: 631-637, 1997.
Makarova, K. S., Aravind, L., Galperin, M. Y., Tatusov, R. L., Wolf, Y. I., Koonin, E. V. Comparative genomics of the archaea: universal and unique protein families. Genome Res., in press, 1999.