Presentation is loading. Please wait.

Presentation is loading. Please wait.

2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC14 2014.

Similar presentations


Presentation on theme: "2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC14 2014."— Presentation transcript:

1 2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC14 2014

2 Protein Function  Proteins are biological molecules that: −Catalyze metabolic reactions −Replicate DNA −Transport molecules −Respond to stimuli

3 2014 Protein Structure

4 2014 Protein Binding Sites UC Davis ChemWiki

5 2014 Goal: Predict where ATP Binds  Adenosine triphosphate (ATP) is the primary energy currency of the cell  Transports chemical energy within cells for most reactions that require energy in the cell

6 2014 ATP Model Based on FEATURE  Builds 3D models of local environment around a protein site given training sets  Calculates chemical properties at varying radial distances from site, and creates a vector containing values of each property in each radial volume  Constructs Naïve Bayes model by comparing distribution of feature vectors between positive and negative sites Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB. "WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures." Nucleic Acids Res. 2003 Jul 1;31(13):3324-7. Wei L, Altman RB. Recognizing protein binding sites using statistical descriptions of their 3D environments. Pac Symp Biocomput. 1998:497-508.

7 2014 Extending the Use of FEATURE  So far, FEATURE only used to predict a protein functional site or a single ion binding site – never a whole small molecule (ligand)  Want to combine FEATURE models to create an overall model to predict ATP binding

8 2014 Training Set  All PDBs (experimental 3D protein structure files) with ATP bound clustered by 30% sequence similarity, and one protein in each cluster used as positive training set – leads to 190 proteins  For negative training set, proteins with ligands not containing any part of ATP selected, then also clustered by 30% similarity – leads to 3345 proteins  Leave 20% out of training data for validation

9 2014 Combining Atomic Models  Build individual FEATURE models for 3 atoms in each section of ATP  Need to combine the 9 atomic models to give one overall molecular model  Train a logistic regression model with the atomic FEATURE scores as features

10 2014 ATP Docking  Want to train model on ATP poses that can actually fit in a binding pocket  For positive proteins, calculate FEATURE score for each of 9 atoms in crystal structure ATP  For negative, use Vina Autodock to dock 1000 ATP poses into a protein  Do this for random sample of negatives equal to number of positive proteins

11 2014 Choosing ATP Poses for Training  For each negative protein, calculate FEATURE scores of the nine atoms for all 1000 ATP poses, then choose pose with highest sum of (normalized) individual scores −Ensures model can distinguish good ATP poses in non-ATP binding proteins from those in real ATP-binding proteins

12 2014 Logistic Regression Model  Build logistic regression model with the 9 individual atomic FEATURE scores for each protein in training set

13 2014 Model Validation  Dock 1000 poses into all training proteins (positive and negative)  Use logistic regression model to score and rank every pose, and choose highest scoring pose for each protein Validation AUC = 0.83 Compares favorably to dock energy (physics based model) with AUC = 0.74

14 2014 ATP Binding Prediction for a Protein Kinase

15 2014 Acknowledgments  Russ Altman for supporting the research  Altman group  LinkedIn for sending me to GHC

16 2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit www.gracehopper.org


Download ppt "2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC14 2014."

Similar presentations


Ads by Google