Miguel A. Andrade
Abstract
In this work I present an algorithm for deriving position-specific
protein functional annotations. The input is based on the results
of a sequence similarity search of a query sequence against a
sequence database. Strings of words are extracted from the descriptions
of the proteins, and the correlation between proteins having the
same descriptors and amino acid conservation is used to compute
a score that indicates which descriptor is likely to best describe
the function of each particular residue. Analysis of the score
curves and comparison of different functions allows an easy detection
of parts of the sequence associated with different functions.
Different levels of functional specificity can be compared, allowing
the choice of the one that best suits the function of the protein.
Immediate applications of this algorithm are, support for (automated)
methods of protein functional annotation, and database coherency
checking.