ISMB99
Comparative genomics: Is it changing the paradigm of evolutionary biology?

Eugene V. Koonin
National Center for Biotechnology Information,
National Library of Medicine,
National Institutes of Health,
Bethesda MD 20894, USA

 

Abstract:
About 20 complete genome sequences of cellular life forms ­ bacteria, archaea and eukaryotes ­ are currently available, and many more are in the pipeline. Considerable comparative analysis of these genomes has already been performed, and while even more challenging work lies ahead, it is fair to ask at this juncture, what is the impact of this research on biology in general. In my opinion, comparative analysis of complete genome has already affected our ideas of what biological evolution is to such an extent that it is appropriate to claim a paradigm shift in evolutionary biology.

Computer analysis of complete genomes of unicellular organisms shows that protein sequences are in general highly conserved in evolution, with at least 70% of them containing ancient conserved regions. This allows us to delineate families of orthologs across a wide phylogenetic range and in many cases, predict protein functions with reasonable confidence. Examination of the 'phylogenetic pattern' for these orthologous families shows only ~100 families, most of which include components of the translation machinery, are universally conserved in all sequenced genomes. Thus horizontal gene transfer and lineage-specific gene loss are not inconsequential evolutionary quirks but rather prevailing forces of evolution, at least in the prokaryotic world. Horizontal transfer and lineage-specific loss of entire genes are complemented by numerous intragenic recombination events that manifest in domain rearrangement at the protein level.

Examination of phylogenetic patterns for families of orthologous proteins also results in more specific conclusions some of which may have far-reaching consequences. In particular, it is now clear that the basic DNA replication machineries (that is, the replicative DNA polymerases, primases, helicases, and several other proteins) in bacteria and in archaea/eukaryotes are not orthologous and may have evolved independently. This leads to a hypotehsis that the common ancestor of all extant cellular life forms (the so-called cenancestor) did not possess a modern-type, DNA-based replication and expression system although it did encode advanced translation and transcription machineries and a considerable repertoire of metabolic enzymes. Instead of a dsDNA genome, the cenancestor might have had a mixed system of small RNA and DNA genetic elements that were interconverted via cycles of transcription and reverse transcription. This model seems to account for both universal and distinct components of the DNA replication machinery in bacteria and archaea-eukaryotes.

Some relevant recent references

Koonin, E. V., Mushegian, A. R., Galperin, M. Y., Walker, D. R., Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea. Mol. Microbiol. 25: 619-637, 1997.

Tatusov, R. L., Koonin, E. V., Lipman, D. J., A genomic perspective on protein families. Science, 278: 631-637, 1997.

Makarova, K. S., Aravind, L., Galperin, M. Y., Tatusov, R. L., Wolf, Y. I., Koonin, E. V. Comparative genomics of the archaea: universal and unique protein families. Genome Res., in press, 1999.

 

back to Schedule

-> ISMB 99