Improving Remote Homology Detection Using Sequence Properties and Position Specific Scoring Matrices

TitleImproving Remote Homology Detection Using Sequence Properties and Position Specific Scoring Matrices
Publication TypeConference Paper
Year of Publication2009
AuthorsGina Cooper, Michael Raymer
Conference Name The 2009 International Conference on Bioinformatics and Computational Biology (BIOCOMP 09)
Conference LocationLas Vegas, Nevada
Abstract

Understanding the structure and function of proteins is a key part of understanding biological systems. Although proteins are complex biological macromolecules, they are made up of only 20 basic building blocks known as amino acids. The makeup of a protein can be described as a sequence of amino acids. One of the most important tools in modern bioinformatics is the ability to search for biological sequences (such as protein sequences) that are similar to a given query sequence. There are many tools for doing this (Altschul et al., 1990, Hobohm and Sander, 1995, Thomson et al., 1994, Karplus and Barrett, 1998). Most of these tools, however, focus on closely related, or homologous, sequences. Distantly related proteins sequences (remote homologs) are of interest to biologists but remain notoriously difficult to find. This dissertation presents a novel method for finding remote homologs in databases of protein sequences. In this method, proteins are characterized according to physiochemical and sequence-based features. Features are then weighted according to their utility in identifying distantly related protein sequences. The feature weights are optimized by a custom genetic algorithm. Position-specific-scoring matrices are used to further increase the ability of the tuned algorithm to generalize its search capability to new sequences. The resulting search method outperforms the most well-known techniques for finding distant homologs, both in terms of accuracy and computation time.

Full Text

Michael Raymer and Gina Cooper, 'Improving Remote Homology Detection Using Sequence Properties and Position Specific Scoring Matrices,' The 2009 International Conference on Bioinformatics and Computational Biology (BIOCOMP 09), Las Vegas, Nevada, July 13-16 2009.

Related Files: