Knowledge Discovery in Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm

TitleKnowledge Discovery in Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm
Publication TypeJournal Article
Year of Publication2003
AuthorsMichael Raymer, Leslie Kuhn, William Punch
KeywordsBayes classifier, curse of dimensionality, feature extraction, feature selection, genetic algorithms, pattern classification, protein solvation
Abstract

A key element of many bioinformatics research problems is the extraction of meaningful information from large experimental data sets. Various approaches, including statistical and graph theoretical methods, data mining, and computational pattern recognition, have been applied to this task with varying degrees of success. We have previously shown that a genetic algorithm coupled with a K nearest-neighbors classifier performs well in extracting information about protein-water binding from X-ray crystallographic protein structure data. Using a novel classifier based on the Bayes discriminant function, we present a hybrid algorithm that employs feature selection and extraction to isolate the salient features from large biological data sets. The effectiveness of this algorithm is demonstrated on various data sets, including an important problem in proteomics and protein folding – prediction of water binding sites near a protein surface.

Full Text

Citation
M. Raymer, A. Kuhn, and W. Punch (2001), “Knowledge Discovery in Biological Datasets Using a Hybrid Bayes Classifier/Evolutionary Algorithm.” Proceedings of the 2nd IEEE International Symposium on Bioinformatics and Bioengineering (BIBE 2001), 236-245. 2001.