ISMB99
Computational Genomics: Biological Discovery in Complete Genomes

Anthony R. Kerlavage, Ph.D.
Senior Director, Gene Discovery


Celera Genomics
Rockville, MD USA

Abstract

The field of genomics was radically changed with the sequencing of the first complete microbial genome, Haemophilus influenzae by The Institute for Genomic Research (TIGR)1. This project made it apparent that the DNA of entire complex organisms many megabases in size could be accurately and rapidly sequenced by using a "shotgun" sequencing strategy. Since that time, TIGR and other labs have combined to completely sequence the genomes of over 20 microbes. Knowing the complete genome sequence of the pathogens in this group will open up exciting opportunities to develop novel pharmaceuticals, biologics, and vaccines. The genomes of two important eukaryotic model organisms, S. cerevisiae2 and C. elegans3 have also been completed. In addition, several chromosomes from P. falciparum and A. thaliana are finished and these entire genomes will soon be complete.

Across all of these species, nearly half of the candidate genes that have been identified cannot be assigned a definitive biological role, leaving open a tremendous opportunity for functional as well as computational genomics. On the other hand, by a combination of molecular sequence analysis techniques, new insights have been made concerning the metabolic pathways, cell-surface receptor and transporter complement, and phylogeny of these organisms. The availability of these complete genomes makes comparative genomic analysis possible, leading to the discovery of synteny among organisms as well as regulatory and developmental networks controlling the expression of genes. The integration and semantic representation of this wealth of data will be critical to our ability to understand it.

At Celera Genomics we have set our goal to become the definitive source of genomic and associated medical information that will be used by scientists to develop a better understanding of the biological processes in humans and agriculturally important organisms and deliver improved healthcare in the future. Using breakthrough DNA sequencing technology, we are operating a genomics sequencing facility with an expected capacity greater than that of the current combined world output4. The early focus at Celera will be on completing the genomes of human, mouse, Drosophila and rice. While the size of these genomes and the speed with which they will be sequenced will present enormous computational challenges for the discovery and characterization of genes, they represent an enormous opportunity to advance the complete understanding of living systems.

References

  1. Whole-Genome Random Sequencing and Assembly of Haemophilus influenzae Rd. Fleischmann, RD et.al. Science 269:496-512, 1995.
  2. The Yeast Genome Directory. Nature 387(Suppl):5-105, 1997.
  3. Genome Sequence of the Nematode C. elegans: A Platform for Investigating Biology. The C. elegans Sequencing Consortium. Science 282:2012-2018, 1998.
  4. Shotgun Sequencing of the Human Genome. Venter, JC, Adams, MD, Sutton, GG, Kerlavage, AR, Smith, HO, and Hunkapiller, M. Science 280:1540-1542, 1998.

 

back to Schedule

-> ISMB 99