Presentation is loading. Please wait.

Presentation is loading. Please wait.

Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD.

Similar presentations


Presentation on theme: "Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD."— Presentation transcript:

1 Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD

2 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Deleterious “Hurtful or injurious to life or health; noxious” (Oxford English Dictionary) “Tis pity wine should be so deleterious, For tea and coffee leave us much more serious.” (BYRON Juan IV, 1821) BYRON

3 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations SNPs What is an SNP (single nucleotide polymorphism)? What is an SNP (single nucleotide polymorphism)? Why are SNPs important? Why are SNPs important? Some SNPs are nonsynonymous Some SNPs are nonsynonymous The molecular effects of SNPs vary widely The molecular effects of SNPs vary widely

4 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations MOTIVATION Improve on the existing deleterious prediction methods Improve on the existing deleterious prediction methods Use protein sequence, evolution and structure data combined with machine learning to identify potentially disease- causing SNPs Use protein sequence, evolution and structure data combined with machine learning to identify potentially disease- causing SNPs

5 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations SNP data is increasingly available Over 40 major online databases Over 40 major online databases dbSNP is the primary SNP database (contains 5,000,000+ validated human SNPs) dbSNP is the primary SNP database (contains 5,000,000+ validated human SNPs) Many databases contain potentially disease- causing SNPs related to a particular disease Many databases contain potentially disease- causing SNPs related to a particular disease

6 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Deleterious effects of mutations on proteins Function Function Stability Stability Expression Expression Protein-Protein Interactions Protein-Protein Interactions

7 Friday 17 rd December 2004Stuart Young Current Classification Tools Sequence Approaches BLOSUM62 An amino acid substitution score matrix SIFT Collects sequence homologues in multiple alignments and identifies non-conservative changes in amino acids Ng P & Henikoff S, 'Predicting Deleterious Amino Acid Substitutions‘. Genome Research, 2001, 11:863-874.

8 Friday 17 rd December 2004Stuart Young Current Classification Tools Structural Approaches Expert rules Uses evolutionary and structural data Sunyaev et al, 'Prediction of deleterious human alleles‘. Human Molecular Genetics, 2001, Vol. 10, No. 6, 593. Decision Trees Improved performance based on sequence and structural data Produces intuitive rules

9 Friday 17 rd December 2004Stuart Young Our foundation for the project Saunders CT & Baker D ‘ ‘Evaluation of Structural and Evolutionary Contributions to Deleterious Mutation Prediction’ J. Mol. Biol. (2002) 322, 891–901 Structural and evolutionary features Structural and evolutionary features Trained classifiers based on two data sets - experimental mutations and human alleles Trained classifiers based on two data sets - experimental mutations and human alleles

10 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations S & B - Training Sets Experimental mutations (~5,000) HIV-1 protease E. Coli Lac repressor T4 Lysozyme Human alleles (~350 mutations) 103 ‘hot’ human genes

11 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Why two training sets? Unbiased human data is hard to get: Many disease-associated mutations are discovered through genetics association studies and may not be causative (i.e., only linked with the causative allele) Effect of mutations is hard to measure Experimental ‘whole gene mutagenesis’ data is used considered ‘unbiased’

12 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Features used in S&B Study SIFT SIFT + Solvent Accessibility(SA) SIFT + normalized B-factor SIFT + Sunyaev expert rules SIFT + SA + B-factor

13 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Hypothesis Can we improve on the results of Saunders and Baker by using more structural and sequence properties?

14 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Experimental Design Classification algorithm Decision Trees Support Vector Neural Nets Additional Features Amino acid relative frequencies Additional structural properties

15 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Structural Property Values Russ Altman (Stanford) developed a vector representation of protein structural sites Spheres (1.875 Å → 7.5 Å ) centered on C- alpha atom of the mutation position 66 features Atom/residue counts within sphere and other features, e.g.: Solubility Solvent accessibility

16 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Amino Acid Windows AA frequencies within a window on either side of the mutation position 20 AAs = 20 features LEFT and RIGHT → 40 features

17 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Amino Acid Windows

18 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Tools Databases PDB - Protein structure data S-BLEST - Structural features Software Perl 5.8.0 Matlab (NN, PRTools(DT), SVC)

19 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations List of Features Used BLOSUM62, disorder, secondary structure, molecular weight Grouped amino acid frequency windows of varying widths SIFT S-BLEST (vector contains four sub-shells spreading outward from site) Solvent accessibility (C-beta density, i.e., the number of C-beta atoms around the site)

20 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Comparison with S&B Results

21 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations 1. Human Data Set Human allele dataset as train and test set Ensembles of decision trees for classification 20-fold cross validation Progressively added features to see their affect on performance Because structural data was not available for all mutation sites, we used a subset of the original Saunders and Baker training set

22 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Best Features

23 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations 1. Experimental Data Set Same as human data set but using experimental mutations for training and testing

24 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Evaluation of S-BLEST Using a Random Subset of the Experimental Training Set

25 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations 3. Cross-classification Used the same features described above Trained on one dataset and tested on the other: Human to experimental Experimental to human Experimental gene to exp. gene

26 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations

27 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations

28 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations

29 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations

30 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Summary of Results Human data set 80% accuracy (up from 70%) Experimental data set 87% accuracy (up from 79.5%)

31 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Conclusion Prediction tools CAN identify deleterious mutations We believe that further study is warranted to identify over-fitted classifiers to further improve classification accuracy on real world data

32 Friday 17 rd December 2004Stuart Young Acknowledgements People Andrew Campen (CCBB IT, IUPUI) Brandon Peters (CCBB, IUPUI) Haixu Tang (Capstone Coordinator, IUB) Funding This work was funded by a grant from the Showalter Trust (Sean Mooney, PI), INGEN, and a IUPUI McNair Scholarship. The Indiana Genomics Initiative (INGEN) Indiana University is supported in part by Lilly Endowment Inc.

33 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations Thank You

34 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations

35 Friday 17 rd December 2004Stuart Young Predicting Deleterious Mutations


Download ppt "Friday 17 rd December 2004Stuart Young Capstone Project Presentation Predicting Deleterious Mutations Young SP, Radivojac P, Mooney SD."

Similar presentations


Ads by Google