Machine Learning to Predict Experimental Protein-Ligand Complexes

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre
Indian Statistical Institute Kolkata
Background Goals Methods Results Conclusions Implications.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.
Sparse vs. Ensemble Approaches to Supervised Learning
Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Sparse vs. Ensemble Approaches to Supervised Learning
Chapter 5 Data mining : A Closer Look.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.
From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.
Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.
An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Data Mining and Decision Support
Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms
Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.
Risk Solutions & Research © Copyright IBM Corporation 2005 Default Risk Modelling : Decision Tree Versus Logistic Regression Dr.Satchidananda S Sogala,Ph.D.,
RECITATION 4 MAY 23 DPMM Splines with multiple predictors Classification and regression trees.
Kaggle Competition Rossmann Store Sales.
Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.
The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.
2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC
PREDICTING SONG HOTNESS
Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.
Machine Learning Usman Roshan Dept. of Computer Science NJIT.
A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )
BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Machine Learning with Spark MLlib
Introduction to Machine Learning
Evaluating Classifiers
Introduction to Machine Learning and Tree Based Methods
Project 4: Facial Image Analysis with Support Vector Machines
Balgobin Y., Telecom ParisTech
Basic machine learning background with Python scikit-learn
Protein Structure Prediction and Protein Homology modeling
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Introduction Feature Extraction Discussions Conclusions Results
Extra Tree Classifier-WS3 Bagging Classifier-WS3
Direct or Remotely sensed
Identifying Confusion from Eye-Tracking Data
Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0) | |
Overfitting and Underfitting
Machine Learning in Practice Lecture 22
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Data Mining, Machine Learning, Data Analysis, etc. scikit-learn
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
Analysis for Predicting the Selling Price of Apartments Pratik Nikte
MIS2502: Data Analytics Classification Using Decision Trees
Predicting Loan Defaults
Regression and Clinical prediction models
Protein structure prediction
Introduction to Machine learning
Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah
Machine Learning in Business John C. Hull
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Machine Learning for Cyber
Presentation transcript:

Machine Learning to Predict Experimental Protein-Ligand Complexes Hyunji Kim and Sarah Walworth Supervising Faculty: Jun Pei and Lin Song Mentor: Kenny Merz

Background Ligand Protein Protein is a large molecules composed of one or more long chains of amino acids Ligand is any molecule or atom which binds reversibly to a proten https://www.pinterest.com/pin/814447913832516926/ https://www.pinterest.com/pin/814447913832516926/

Main Objective Traditional Random Forest Scoring Functions 60% The overall purpose of our data is to improve the ability to predict native crystal structure General traditional scoring functions have about 60 % accuracy Therefore for our project, we will be using random forest to improve this accuracy

766 Protein-Ligand Crystal Structure Main Objective The overall purpose of our data is to improve the ability to predict native crystal structure General traditional scoring functions have about 60 % accuracy Therefore for our project, we will be using random forest to improve this accuracy 766 Protein-Ligand Crystal Structure

Schrodinger Glide Software Data Generation Insert video here! Schrodinger Glide Software We also applied Garf to approximate potential energy function

Machine Learning: Random Forest RF is created by having a large collection of decision trees Each tree gives a distinct classification and forest will choose the classification having the most votes over all the other trees https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/ https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Machine Learning: Random Forest •Generated using the Skit-Lean tool •5-fold cross validation to prevent overfitting •Randomly select 70% training set and 30% test set. •Each RF decision tree classify ‘0’ if the rank of native ligand is less than the rank of each ligand decoy and classify a ‘1’ if the rank of native ligand is greater than the rank of each ligand decoy. (or use equation) •Outcome: training and testing accuracies https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Machine Learning: Random Forest Confusion Matrix •Generated using the Skit-Lean tool •5-fold cross validation to prevent overfitting •Randomly select 70% training set and 30% test set. •Each RF decision tree classify ‘0’ if the rank of native ligand is less than the rank of each ligand decoy and classify a ‘1’ if the rank of native ligand is greater than the rank of each ligand decoy. (or use equation) •Outcome: training and testing accuracies https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Hyperparameter Tuning: Grid Search The main purpose of hyperparameter tuning is to optimize performance We will “tune” # of decision trees (1000 trees or 2000 trees) and # of features considered for selecting a node (branched in each tree will be adjustified based on the number of trees in the model) With this best optimal setting is to try tune different combinations and evaluate best performance model https://giphy.com/gifs/radio-1fDlDkl86AxVK http://www.clipartpanda.com/categories/tree-clip-art-transparent-background http://www.rivex.co.uk/Online-Manual/RivEX-Online-Manual.html?Rivernetworkspecifications.html https://giphy.com/gifs/radio-1fDlDkl86AxVK http://www.clipartpanda.com/categories/tree-clip-art-transparent-background http://www.rivex.co.uk/Online-Manual/RivEX-Online-Manual.html?Rivernetworkspecifications.html

Scikit-Learn: Grid Search Results

Final Random Forest Parameter

Chemscore ASP Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling https://towardsdatascience.com/how-are-logistic-regression-ordinary-least-squares-regression-related-1deab32d79f5

Astex Statistical Potential (ASP) Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling https://openclipart.org/detail/282566/grin-smiley

Astex Statistical Potential (ASP) Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling

Results

Ongoing Work https://www.pinterest.com/pin/499829258622082318/

Applications https://elm.umaryland.edu/tag/new-drug-design/ World Cup 2018: https://www.google.com/

https://tenor.com/search/thank-you-gifs