Martin Tompla
Abstract
This is an investigation of
methods for finding short motifs that only occur in a fraction
of the input sequences. Unlike local search techniques that may
not reach a global optimum, the method proposed here is guaranteed
to produce the motifs with greatest z-scores. This method is illustrated
for the Ribosome Binding Site Problem, which is to identify the
short mRNA 5' untranslated sequence that is recognized by the
ribosome during initiation of protein synthesis. Experiments were
performed to solve this problem for each of fourteen sequenced
prokaryotes, by applying the method to the full complement of
genes from each. One of the interesting results of this experimentation
is
evidence that the recognized sequence of the thermophilic archaea
A. fulgidus, M. jannaschii, M. thermoautotrophicum, and P. horikoshii
may be somewhat different than the well known Shine-Dalgarno sequence