Download presentation
Presentation is loading. Please wait.
Published byLoraine Lawrence Modified over 9 years ago
1
Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1
2
123 Background Methods Experiments Contents 2
3
Background 3
4
>Example PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQ EFFPKFKGLTTADELKKSADVRWHAERIINAVDDAVASMDDTEKMS MKLRNLSGKHAKSFQVDPEYFKVLAAVIADTVAAGDAGFEKLMSMI 4 Definition of protein 20 different amino acids … AC D V W Y
5
Protein prediction related problems 5 Protein Protein structural class prediction Protein fold prediction Multi-functional enzyme prediction Protein remote homology detection Other protein-related problems, etc. Protein subcellular localization prediction
6
6 Common points Treat the protein-related problems as classification tasks Query protein sequence Data presentation Classification algorithms Predicted results The framework of a classification task Two major components
7
Methods 7
8
Feature extraction methods 8 Primary sequence based Secondary structure based Sequence-structure based e.g. Physicochemical features, N-gram, Functional Domain, PSSM-profile (auto-covariance), etc. e.g. Secondary sequence based, and probability matrix based e.g. Triple-sequence-structure features
9
Primary-sequence based 9 n-gram model Given a query protein sequence: Compute Obtain
10
10 A query protein sequence … … … Database sequence 1 Database sequence 2 Database sequence 3 Database sequence n-2 Database sequence n-1 Database sequence n … … … 0 1 0 1 0 0 PSI-BLAST Functional protein database Feature vector Primary-sequence based Functional domain … … …
11
11 Position-Specific Score Matrix (PSSM) Protein database PSI-BLAST Primary-sequence based Evolution information
12
12 20-D features Primary-sequence based AAC features Compute Obtain
13
13 20*g-D features Primary-sequence based Auto-covariance (AC) transformation Compute Obtain
14
14 Primary-sequence based PSSM profileFrequency profile Consensus sequence Consensus sequence: A query sequence:
15
15 Secondary structure based Secondary structure sequence SLFEQLGGQAAVQAVTAQFYANIQAD A example of a query protein sequence : CCHEHEEEEECCCCHHHHHHEEEEECC Predicted secondary structure sequence, which has three states: PSI-PRED C (coil), H (Helix), E (strand)
16
16 Secondary structure based Structure state confidence matrix A example of a structure state confidence matrix: A query protein sequence Predicted structure sequence Predicted confidence
17
17 Secondary structure based Global structural features Compute Obtain Structure state confidence matrix:
18
18 Secondary structure based Local structural features ComputeObtain Structure state confidence matrix:
19
19 Sequence-structure based The framework of triple sequence-structure feature extraction method
20
20 Classification algorithms Commonly used classification algorithms e.g. Support Vector Machine (SVM), Random Forest (RF), SMO, Naive Bayes, etc. Ensemble classification algorithms e.g. Majority Vote, Average Probability, Selective Ensemble, etc.
21
Experiments 21
22
22 The framework of RF_PSCP Webserver site : http://59.77.16.70:8080/RF_PSCP/Index.htmlhttp://59.77.16.70:8080/RF_PSCP/Index.html
23
23 Datasets Three benchmark datasets Three updated large-scale datasets Sequence similarity Protein structural class prediction
24
24 Results Comparison with existing methods on three benchmark datasets
25
25 Results Tests of the proposed method on three updated large-scale datasets
26
26 Results Comparison with different combinations of feature subsets on three benchmark datasets
27
27 Results Optimization of Random forest classifier
28
28
29
Q&A ! 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.