Protein Fold Class Prediction: New Methods of Statistical Classification

Janet Grassmann, Martin Rezcko, Sandor Suhai and Lutz Edler

Department of Statistics, Stanford University, Palo Alto, CA 94305 USA
Synaptic Ltd Aristotelous 313, 13671 Acharnai, Greece
Department of Molecular Biophysics and
Biostatistics Unit, German Cancer Research Center, D-96120 Heidelberg

Feed forward neural networks are compared with standard and new statistical classification procedures for the classification of proteins. We applied logistic regression, an additive model and projection pursuit regression form the methods based on a posteriori probabilities: linear, quadratic and a flexible discriminant analysis from the methods based on class conditional probabilities: and the K-nearest neighbors classification rule. Both, the apparent error rate obtained with the training sample (n=143) and the test error rate obtained with the test sample (n=125), and the 10-fold cross-validation error were calulated. We conclude that some of the standard statistical methods are potent competitors to the more flexible tools of machine learning.


back to Schedule

-> ISMB 99