Gary Benson and Lan Dong
Abstract
One of the less well understood mutational transformations that
act upon DNA is tandem duplication. In this process, a stretch
of DNA is duplicated to produce two or more adjacent copies, resulting
in a tandem repeat. Over time, the copies undergo additional mutations
so that typically, multiple approximate tandem copies are present.
An interesting feature of tandem repeats is that the duplicated
copies are preserved together, making it possible to do ``phylogenetic
analysis'' on a single sequence. This involves using the pattern
of mutations among the copies to determine a minimal or a most
likely history for the repeat. A history tries to describe the
interwoven pattern of substitutions, indels, and duplication events
in such a way as to minimize the number of identical mutations
that arise independently. Because the copies are adjacent and
ordered, the history problem can not be solved by standard phylogeny
algorithms. In this paper, we introduce several versions of the
tandem repeat history problem, develop algorithmic solutions and
evaluate their performance. We also develop ways to visualize
important features of a history with the goal of discovering properties
of the duplication mechanism.