2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC14 2014.

Slides:



Advertisements
Similar presentations
Chapter 3 Biochemistry Modern Biology Textbook Holt
Advertisements

Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
1 Miklós Vargyas, Judit Papp May, 2005 MarvinSpace – live demo.
Chapter 5 – The Working Cell
Protein Structure and Physics. What I will talk about today… -Outline protein synthesis and explain the basic steps involved. -Go over the Chemistry of.
Study Guide 1.1 Answers should include different land and aquatic environments, as well as the atmosphere. the variety of life on Earth a type of living.
Background Goals Methods Results Conclusions Implications.
PEER LESSON Chapter 6.3, 6.4, HOW IS ENERGY TRANSPORTED WITHIN CELLS? ENERGY CARRIER MOLECULES Glucose cannot be used to fuel the endergonic.
Chapter 3 Table of Contents Section 1 Carbon Compounds
Docking of Protein Molecules
Molecular Docking Using GOLD Tommi Suvitaival Seppo Virtanen S Basics for Biosystems of the Cell Fall 2006.
Copyright © 2003 Pearson Education, Inc. publishing as Benjamin Cummings Chapter 5 The Working Cell.
Properties of Life. What IS Zoology? Zoology – study of animals  Diversity  Organization.
Protein Tertiary Structure Prediction
Copyright © by Holt, Rinehart and Winston. All rights reserved. ResourcesChapter menu Biochemistry Chapter 3 Table of Contents Section 1 Carbon Compounds.
Chapter 3 Table of Contents Section 1 Carbon Compounds
Astrobiology: The Nature of Life (Chapter 3) Properties of Living Systems Evolution as a Unifying Theme Structural Features of Living Systems Biochemical.
Chapter 3 Enzymes.
6 Energy and Energy Conversions Cells must acquire energy from their environment. Cells cannot make energy; energy is neither created nor destroyed, but.
6 Energy, Enzymes, and Metabolism. 6 Energy and Energy Conversions To physicists, energy represents the capacity to do work. To biochemists, energy represents.
Last Lecture….. Proteins Carbohydrates Enzymes. Study Guide Use study guide to determine what you need to know. 95% of test will be from study guide.
The Working Cell. How Cells Get Energy Life Depends on Energy What is energy? 1 st Law of Thermodynamics Two types of energy? 2 nd Law of Thermodynamics.
Ch. 2: “Chemistry of Life”
Chapter 8 Metabolism: Energy and Enzymes Energy is the capacity to do work; cells must continually use energy to do biological work. Kinetic Energy is.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
SimBioSys Inc.© Slide #1 Enrichment and cross-validation studies of the eHiTS high throughput screening software package.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Biology Jeopardy Ch 2 Review Section 1Section 2Section 3Section 4LabsTerms
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Enzyme and Energy Review. Enzyme Enzymes are catalytic molecules. That is, they speed up specific reactions without being used up in the reaction. Enzymes.
Identification of amino acid residues in protein-protein interaction interfaces using machine learning and a comparative analysis of the generalized sequence-
What our bodies are made of Chemistry of Cells. Nature of Matter All matter is made of atoms. Atoms consist of electrons, protons and neutrons. Molecules.
Biology Chapter 8 Section 3. Key Ideas  How do cells use signal molecules?  How do cells receive signals?  How do cells respond to signaling?
ENZYMES and Activation Energy
Unit 1 Cell and Molecular Biology Section 6 Catalysis.
Topic 4. Metabolism September 28, 2005 Biology 1001.
Energy and Enzymes Almost all energy for life is derived from the sun. Life requires energy. A “factoid” - The sun’s energy that strikes Earth each day.
Chapter 2 Review. Atomic Structure Protons Neutrons Electrons.
Enzymes. Enzymes-definition  Act as catalysts, lowering activation energy needed for reactions-speed up reaction.  Substrate binds to enzyme’s active.
Chapter 2 Chemistry of Life Section 1: Nature of Matter Section 2: Water and Solutions Section 3: Chemistry of Cells Section 4: Energy and Chemical Reactions.
Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.
Dr. Abdelkrim Rachedi. 1. General introduction to bioinformatics. 2. Databases in biology: -> 2.1. Databases for the primary structure of Proteins and.
EBI is an Outstation of the European Molecular Biology Laboratory. A web based integrated search service to understand ligand binding and secondary structure.
Fraud Detection with Machine Learning: A Case Study from Sift Science
Basic Biochemistry. What is Biochemistry?  Biochemistry is the study of the chemical interactions of living things.  Biochemists study the structures.
` Comparison of Gene Ontology Term Annotations Between E.coli K12 Databases REDDYSAILAJA MARPURI WESTERN KENTUCKY UNIVERSITY.
Chapter 1 Review  Key Concepts we have covered so far.
CHM 708: MEDICINAL CHEMISTRY
PDBe Protein Interfaces, Surfaces and Assemblies
PDBemotif A web based integrated search service to understand ligand binding and secondary structure properties in macromolecular structures.
Enzymes Regulatory enzymes are usually the enzymes that are the rate-limiting, or committed step, in a pathway, meaning that after this step a particular.
Chapter 5 The Working Cell.
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Proteins!.
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Section 3: Carbon Compounds
Chapter 3 Table of Contents Section 1 Carbon Compounds
Virtual Screening.
Metabolism 8.1.
Foundations of biology
Machine Learning to Predict Experimental Protein-Ligand Complexes
BIO231 Flash Cards for Raven Chapter 6b
Amino Acids An amino acid is any compound that contains an amino group (—NH2) and a carboxyl group (—COOH) in the same molecule.
Chapter 3 Table of Contents Section 1 Carbon Compounds
Matter and Energy Pathways in Living Systems
Energy and Enzymes Life requires energy.
Cheminformatics Basics
Section 3: Carbon Compounds
- Carbon Compounds 2:3.
Presentation transcript:

2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC

Protein Function  Proteins are biological molecules that: −Catalyze metabolic reactions −Replicate DNA −Transport molecules −Respond to stimuli

2014 Protein Structure

2014 Protein Binding Sites UC Davis ChemWiki

2014 Goal: Predict where ATP Binds  Adenosine triphosphate (ATP) is the primary energy currency of the cell  Transports chemical energy within cells for most reactions that require energy in the cell

2014 ATP Model Based on FEATURE  Builds 3D models of local environment around a protein site given training sets  Calculates chemical properties at varying radial distances from site, and creates a vector containing values of each property in each radial volume  Constructs Naïve Bayes model by comparing distribution of feature vectors between positive and negative sites Liang MP, Banatao DR, Klein TE, Brutlag DL, Altman RB. "WebFEATURE: an interactive web tool for identifying and visualizing functional sites on macromolecular structures." Nucleic Acids Res Jul 1;31(13): Wei L, Altman RB. Recognizing protein binding sites using statistical descriptions of their 3D environments. Pac Symp Biocomput. 1998:

2014 Extending the Use of FEATURE  So far, FEATURE only used to predict a protein functional site or a single ion binding site – never a whole small molecule (ligand)  Want to combine FEATURE models to create an overall model to predict ATP binding

2014 Training Set  All PDBs (experimental 3D protein structure files) with ATP bound clustered by 30% sequence similarity, and one protein in each cluster used as positive training set – leads to 190 proteins  For negative training set, proteins with ligands not containing any part of ATP selected, then also clustered by 30% similarity – leads to 3345 proteins  Leave 20% out of training data for validation

2014 Combining Atomic Models  Build individual FEATURE models for 3 atoms in each section of ATP  Need to combine the 9 atomic models to give one overall molecular model  Train a logistic regression model with the atomic FEATURE scores as features

2014 ATP Docking  Want to train model on ATP poses that can actually fit in a binding pocket  For positive proteins, calculate FEATURE score for each of 9 atoms in crystal structure ATP  For negative, use Vina Autodock to dock 1000 ATP poses into a protein  Do this for random sample of negatives equal to number of positive proteins

2014 Choosing ATP Poses for Training  For each negative protein, calculate FEATURE scores of the nine atoms for all 1000 ATP poses, then choose pose with highest sum of (normalized) individual scores −Ensures model can distinguish good ATP poses in non-ATP binding proteins from those in real ATP-binding proteins

2014 Logistic Regression Model  Build logistic regression model with the 9 individual atomic FEATURE scores for each protein in training set

2014 Model Validation  Dock 1000 poses into all training proteins (positive and negative)  Use logistic regression model to score and rank every pose, and choose highest scoring pose for each protein Validation AUC = 0.83 Compares favorably to dock energy (physics based model) with AUC = 0.74

2014 ATP Binding Prediction for a Protein Kinase

2014 Acknowledgments  Russ Altman for supporting the research  Altman group  LinkedIn for sending me to GHC

2014 Got Feedback? Rate and Review the session using the GHC Mobile App To download visit