Abstract: Detecting protein-protein interactions is a central
problem in computational biology and aberrant such interactions may
have implicated in a number of neurological disorders. As a result,
the prediction of protein-protein interactions has recently received
considerable attention from biologist around the globe.
Computational tools that are capable of effectively identifying
protein-protein interactions are much needed. In this paper, we
propose a method to detect protein-protein interaction based on
substring similarity measure. Two protein sequences may interact by
the mean of the similarities of the substrings they contain. When
applied on the currently available protein-protein interaction data for
the yeast Saccharomyces cerevisiae, the proposed method delivered
reasonable improvement over the existing ones.
Abstract: Introduction: The production of biological information has become much greater than its consumption. The key issue now is how to organise and manage the huge amount of novel information to facilitate access to this useful and important biological information. One core problem in classifying biological information is the annotation of new protein sequences with structural and functional features.
Method: This article introduces the application of string kernels in classifying protein sequences into homogeneous families. A string kernel approach used in conjunction with support vector machines has been shown to achieve good performance in text categorisation tasks. We evaluated and analysed the performance of this approach, and we present experimental results on three selected families from the SCOP (Structural Classification of Proteins) database. We then compared the overall performance of this method with the existing protein classification methods on benchmark SCOP datasets.
Results: According to the F1 performance measure and the rate of false positive (RFP) measure, the string kernel method performs well in classifying protein sequences. The method outperformed all the generative-based methods and is comparable with the SVM-Fisher method.
Discussion: Although the string kernel approach makes no use of prior biological knowledge, it still captures sufficient biological information to enable it to outperform some of the state-of-the-art methods.
Abstract: Few years back, Jaakkola and Haussler published a method of combining generative and
discriminative approaches for detecting protein homologies. The method was a variant
of support vector machines using a new kernel function called Fisher Kernel. They begin
by training a generative hidden Markov model for a protein family. Then, using the
model, they derive a vector of features called Fisher scores that are assigned to the
sequence and then use support vector machine in conjunction with the sher scores for
protein homologies detection. In this paper, we revisit the idea of using a discriminative
approach, and in particular support vector machines for protein homologies detection.
However, in place of the Fisher scoring method, we present a new Hidden Markov Model
Combining Scores approach. Six scoring algorithms are combined as a way of extracting
features from a protein sequence. Experiments show that our method, improves on
previous methods for homologies detection of protein domains.