Application of String Kernels in Protein Sequence Classification
Add to your Conference/Group
Add your comments:
Insert YouTube Videos inside your Slideworld presentation Copy and paste the video URL from YouTube, choose where to insert the video, and press “Submit”. The video will play in your slideshow after sometime.
Enter YouTube video URL
Enter Slide No where you want to insert youtube videos
Post a comment
Post Comment on Twitter
Post Comment on SlideWorld
Subscribe to follow-up comments
SlideWorld will not store your password. SlideWorld will maintain your privacy.
Subscribe to follow-up comments
Slide 1 :
Application of String Kernels in Protein Sequence Classification Nazar Zaki*, Safaai Deris and Rosli Illias *Bioinformatics Laboratory College of Information Technology United Arab Emirates University (UAEU) Al Aim 17555, UAE firstname.lastname@example.org
Slide 2 :
2 Introduction There has been an unprecedented growth in biological information collections. The key issue, now, is how to organize and manage this huge novel amount of information? Protein remote homology detection is the task of classifying protein sequence into one predefined family. Proteins function controls how all the aspects of an organism work. If a method can be designed to reliably group all proteins which share the same functional domains into families, newly discovered proteins should be classified more easily. Doubles approximately every 14 months
Slide 3 :
3 Introduction Cont. Recently, string kernel in conjunction with support vector machines has been introduced and shown to achieve good performance on text categorization tasks [Lodhi]. In this paper, we introduce the application of string kernel in classifying protein sequence into homogenous families. We empirically evaluate and analyze the performance of this approach and present in depth experimental results on selected three families from SCOP database. We then compare it to the most successful present homology detection methods on benchmark SCOP data sets. The string kernel delivers well performance in classifying protein sequences
Slide 4 :
4 Several methods have been developed to aid in the classification but so far they have only had partial success. Development Steps: In the first step, researchers mainly focused on pairwise similarities between protein sequences [Smith Waterman dynamic programming algorithm, BLAST, FASTA]. Further accuracy was achieved by applying Hidden Markov Models (HMMs) to the problems of statistical modeling, database searching and multiple sequence alignment of protein families [SAM, Baldi, Krogh, Haussler]. SVM-Fisher method gained additional accuracy by modeling the difference between positive and negative examples. The method is a variant of Support Vectors Machines (SVMs) [Jaakkola & Haussler]. Toward this end, another algorithm was proposed a means of representing proteins using pairwise sequence similarity scores combined with SVMs, provides a powerful means of detecting subtle structural and evolutionary relationships among proteins [Li Liao]. Background
Slide 5 :
5 String Kernel Why? The kernel is designed to be very simple and efficient to compute and does not depend on any generative model We produce an SVM classier that can classify test sequences in linear time. In the experiments reported here, we do not incorporate prior biological information specific to protein classification, although we are currently trying to do so.
Slide 6 :
6 Kernel Function and Support Vector Machines
Slide 7 :
7 String Kernel String Kernel (SK) was built on the advances demonstrated by Watkins and Haussler. They proposed a sophisticated way of dealing with string data. The basic idea of the SK is to compare the strings by means of the substrings they contain. The substring need not always be contiguous; the first and last elements of a substring are, the less weight should be given to the similarity
Slide 8 :
Slide 9 :
9 Formulation of The SK
Slide 10 :
Slide 11 :
11 Experiments Variability of tunable parameters The main goal of the experiments is on the understanding of how SVM-SK works in classifying protein sequences. The objectives of the experiments are to observe the influence of variability of tunable parameters length (k) , weight (?) and the regularization parameter (C). Our experiments are performed on: For the evaluation F1 has been used. F1 measure gives equal weighting to both precision and recall. where p is the precision and r is the recall. We also report the median rate of false positive (RFP), to evaluate the performance in comparison with the existing algorithms.
Slide 12 :
12 Efficient Implementation Our goal to evaluate the performance of SK in conjunction with SVM (SVM-SK). We used an implementation of SVMs (The Kernel-adatron). In order to speed the computation of the kernel matrix, we have applied different tricks. First, the program reads a single file containing training and test sets. Second, we added two files containing the indexes of training set and test set.
Slide 13 :
13 The effect of variability of subsequence length, weight decay factor and the regularization parameter on SVM-SK performance Varying k: (? = 0.9; C = 1000) k = 5-7 Varying ?: ( k = 2; C = 1000) ? = 0.9 Varying C: (k = 2; ? = 0.9) C = 10-80 Results
Slide 14 :
14 Comparison of four protein homology detection methods Results Cont.
Slide 15 :
15 Conclusion Although the SK is making no use of the semantic prior biological knowledge that the structure of amino acid within the protein sequence can give and yet captures semantic information to the level that it can outperform state of art methods. The results on the SCOP dataset were, however, less encouraging. In most cases the SVM-Fisher method outperformed the SVM-SK. However, the SVM-SK method has several advantages over SVM-Fisher. The SVM-Fisher method needs a lot of data or prior knowledge to train the hidden Markov model. In addition, because calculating the Fisher scores depends on dynamic programming (quadratic in sequence length for profile HMMs), in practice it is very expensive to compute the kernel matrix. The computational efficiency: SVM-SK: Each kernel O(n2), where n is the number of the training examples SVM-Fisher Method: SVM Optimization O(n2), Vectorization is O(nmp), where n is the number of the training examples, m is the length of the longest training set sequence, and p is the number of the HMM parameters.
Slide 16 :
16 What’s next ? Future work will consider the extension of the techniques to incorporate biological knowledge. It will be important to extend the method to identify multiple domains within large protein sequences. The development of an automatic tuning of the mentioned parameters will definitely improve the performance. Analyzing the SK parameters efficiency on all SCOP database. Further comparison with the recent methods will be investigated [Liao, Leslie].
Slide 17 :
17 Citation Zaki, N. M., Deris, S., and Illias, R. M. (2005). “Application of string kernels in protein sequence classification”, Applied Bioinformatics, Volume 4, issue 1, pages 45-52.
Dental Caries intro...
Codification and Cla...
Application of simul...
Protein Kinase C iso...
Vigilance in applica...
Free Powerpoint Templates
5 Years ago.
1751 Views, 0 favourite
PowerPoint Presentation on Application of String Kernels in Protein Sequence Classification or Powe
PowerPoint Presentation on Application of String Kernels in Protein Sequence Classification or PowerPoint Presentation on
More By User
Flag as inappropriate
Select your reason for flagging this presentation as inappropriate. If needed, use the
form to let us know more details.
Other Terms Of Service Violation