ISMB99
Analysis of ribosomal RNA sequences by combinatorial clustering

Poe Xing1, Casimir Kulikowski1, Ilya Muchnik1, Inna Dubchak2, Denise M Wolf2, Sylvia Spengler2, and Manfred Zorn2

  1. DIMACS and CS Department, Rutgers University, Piscataway, NJ; 08855-1179.
    xingpoe@cs.rutgers.edu , kulikows@cs.rutgers.edu,
    muchnik@dimacs.rutgers.edu.
  2. National Energy Research Scientific Computing Center,
    Lawrence Berkeley National Laboratory, MS 84-171, Berkeley, CA., 94720, USA
    ildubchak@lbl.gov , dmwolf@lbl.gov, sjspengler@lbl.gov, mdzorn@lbl.gov

Abstract
We present an analysis of multi-aligned eukaryotic and procaryotic small subunit rRNA sequences using a novel segmentation and clustering procedure capable of extracting subsets of sequences that share common sequence features. This procedure consists of: i) segmentation of aligned sequences using a dynamic programming procedure, and subsequent identification of likely conserved segments; ii) for each putative conserved segment, extraction of a locally homogeneous cluster using a novel polynomial procedure; and iii) intersection of clusters associated with each conserved segment. Aside from their utility in processing large gap-filled multi-alignments, these algorithms can be applied to a broad spectrum of rRNA analysis functions such as subalignment, phylogenetic subtree extraction and construction, and organism tree- placement, and can serve as a framework to organize sequence data in an efficient and easily searchable manner. The sequence classification we obtained using the method presented here shows a remarkable consistency with the independently constructed eukaryotic phylogenetic tree.

back to Schedule

-> ISMB 99