Poe Xing1, Casimir Kulikowski1, Ilya Muchnik1, Inna Dubchak2, Denise M Wolf2, Sylvia Spengler2, and Manfred Zorn2
Abstract
We present an analysis of multi-aligned
eukaryotic and procaryotic small subunit rRNA sequences using
a novel segmentation and clustering procedure capable of extracting
subsets of sequences that share common sequence features. This
procedure consists of: i) segmentation of aligned sequences using
a dynamic programming procedure, and subsequent identification
of likely conserved segments; ii) for each putative conserved
segment, extraction of a locally homogeneous cluster using a novel
polynomial procedure; and iii) intersection of clusters associated
with each conserved segment. Aside from their utility in processing
large gap-filled multi-alignments, these algorithms can be applied
to a broad spectrum of rRNA analysis functions such as subalignment,
phylogenetic subtree extraction and construction, and organism
tree- placement, and can serve as a framework to organize sequence
data in an efficient and easily searchable manner. The sequence
classification we obtained using the method presented here shows
a remarkable consistency with the independently constructed eukaryotic
phylogenetic tree.