Presentation is loading. Please wait.

Presentation is loading. Please wait.

Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1.

Similar presentations


Presentation on theme: "Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1."— Presentation transcript:

1 Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1

2 123 Background Methods Experiments Contents 2

3 Background 3

4 >Example PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQ EFFPKFKGLTTADELKKSADVRWHAERIINAVDDAVASMDDTEKMS MKLRNLSGKHAKSFQVDPEYFKVLAAVIADTVAAGDAGFEKLMSMI 4 Definition of protein 20 different amino acids … AC D V W Y

5 Protein prediction related problems 5 Protein Protein structural class prediction Protein fold prediction Multi-functional enzyme prediction Protein remote homology detection Other protein-related problems, etc. Protein subcellular localization prediction

6 6 Common points Treat the protein-related problems as classification tasks Query protein sequence Data presentation Classification algorithms Predicted results The framework of a classification task Two major components

7 Methods 7

8 Feature extraction methods 8  Primary sequence based  Secondary structure based  Sequence-structure based e.g. Physicochemical features, N-gram, Functional Domain, PSSM-profile (auto-covariance), etc. e.g. Secondary sequence based, and probability matrix based e.g. Triple-sequence-structure features

9 Primary-sequence based 9 n-gram model Given a query protein sequence: Compute Obtain

10 10 A query protein sequence … … … Database sequence 1 Database sequence 2 Database sequence 3 Database sequence n-2 Database sequence n-1 Database sequence n … … … 0 1 0 1 0 0 PSI-BLAST Functional protein database Feature vector Primary-sequence based Functional domain … … …

11 11 Position-Specific Score Matrix (PSSM) Protein database PSI-BLAST Primary-sequence based Evolution information

12 12 20-D features Primary-sequence based AAC features Compute Obtain

13 13 20*g-D features Primary-sequence based Auto-covariance (AC) transformation Compute Obtain

14 14 Primary-sequence based PSSM profileFrequency profile Consensus sequence Consensus sequence: A query sequence:

15 15 Secondary structure based Secondary structure sequence SLFEQLGGQAAVQAVTAQFYANIQAD A example of a query protein sequence : CCHEHEEEEECCCCHHHHHHEEEEECC Predicted secondary structure sequence, which has three states: PSI-PRED C (coil), H (Helix), E (strand)

16 16 Secondary structure based Structure state confidence matrix A example of a structure state confidence matrix: A query protein sequence Predicted structure sequence Predicted confidence

17 17 Secondary structure based Global structural features Compute Obtain Structure state confidence matrix:

18 18 Secondary structure based Local structural features ComputeObtain Structure state confidence matrix:

19 19 Sequence-structure based The framework of triple sequence-structure feature extraction method

20 20 Classification algorithms  Commonly used classification algorithms e.g. Support Vector Machine (SVM), Random Forest (RF), SMO, Naive Bayes, etc.  Ensemble classification algorithms e.g. Majority Vote, Average Probability, Selective Ensemble, etc.

21 Experiments 21

22 22 The framework of RF_PSCP Webserver site : http://59.77.16.70:8080/RF_PSCP/Index.htmlhttp://59.77.16.70:8080/RF_PSCP/Index.html

23 23 Datasets Three benchmark datasets Three updated large-scale datasets Sequence similarity Protein structural class prediction

24 24 Results Comparison with existing methods on three benchmark datasets

25 25 Results Tests of the proposed method on three updated large-scale datasets

26 26 Results Comparison with different combinations of feature subsets on three benchmark datasets

27 27 Results Optimization of Random forest classifier

28 28

29 Q&A ! 29


Download ppt "Study of Protein Prediction Related Problems Ph.D. candidate 2013.10.16 Le-Yi WEI 1."

Similar presentations


Ads by Google