Machine Learning to Predict Experimental Protein-Ligand Complexes

Slides:

Advertisements

Similar presentations

Random Forest Predrag Radenković 3237/10

Advertisements

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Improving enrichment rates A practical solution to an impractical problem Noel O’Boyle Cambridge Crystallographic Data Centre

Indian Statistical Institute Kolkata

Background Goals Methods Results Conclusions Implications.

Decision Tree Rong Jin. Determine Milage Per Gallon.

Lesson learnt from the UCSD datamining contest Richard Sia 2008/10/10.

Sparse vs. Ensemble Approaches to Supervised Learning

Geometric Crossovers for Supervised Motif Discovery Rolv Seehuus NTNU.

ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.

Sparse vs. Ensemble Approaches to Supervised Learning

Chapter 5 Data mining : A Closer Look.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.

Attention Deficit Hyperactivity Disorder (ADHD) Student Classification Using Genetic Algorithm and Artificial Neural Network S. Yenaeng 1, S. Saelee 2.

INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.

Overcoming the Curse of Dimensionality in a Statistical Geometry Based Computational Protein Mutagenesis Majid Masso Bioinformatics and Computational Biology.

From Genomic Sequence Data to Genotype: A Proposed Machine Learning Approach for Genotyping Hepatitis C Virus Genaro Hernandez Jr CMSC 601 Spring 2011.

Methodology Qiang Yang, MTM521 Material. A High-level Process View for Data Mining 1. Develop an understanding of application, set goals, lay down all.

Today Ensemble Methods. Recap of the course. Classifier Fusion

Recursive Binary Partitioning Old Dogs with New Tricks KDD Conference 2009 David J. Slate and Peter W. Frey.

An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Naive Bayes Members: Hung-Yi Lo, Kai-Wei.

CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.

Data Mining and Decision Support

Finding τ → μ−μ−μ+ Decays at LHCb with Data Mining Algorithms

Ubiquitination Sites Prediction Dah Mee Ko Advisor: Dr.Predrag Radivojac School of Informatics Indiana University May 22, 2009.

Risk Solutions & Research © Copyright IBM Corporation 2005 Default Risk Modelling : Decision Tree Versus Logistic Regression Dr.Satchidananda S Sogala,Ph.D.,

RECITATION 4 MAY 23 DPMM Splines with multiple predictors Classification and regression trees.

Kaggle Competition Rossmann Store Sales.

Eco 6380 Predictive Analytics For Economists Spring 2016 Professor Tom Fomby Department of Economics SMU.

The goal of the project is to predict the survival of passengers based off a set of data. To do this we train a prediction system.

2014 Using machine learning to predict binding sites in proteins Jenelle Bray Stanford University October 10, 2014 #GHC

PREDICTING SONG HOTNESS

Does one size really fit all? Evaluating classifiers in a Bag-of-Visual-Words classification Christian Hentschel, Harald Sack Hasso Plattner Institute.

Machine Learning Usman Roshan Dept. of Computer Science NJIT.

A new protein-protein docking scoring function based on interface residue properties Reporter: Yu Lun Kuo (D )

BNFO 615 Fall 2016 Usman Roshan NJIT. Outline Machine learning for bioinformatics – Basic machine learning algorithms – Applications to bioinformatics.

Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Machine Learning with Spark MLlib

Introduction to Machine Learning

Evaluating Classifiers

Introduction to Machine Learning and Tree Based Methods

Project 4: Facial Image Analysis with Support Vector Machines

Balgobin Y., Telecom ParisTech

Basic machine learning background with Python scikit-learn

Protein Structure Prediction and Protein Homology modeling

Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.

Introduction Feature Extraction Discussions Conclusions Results

Extra Tree Classifier-WS3 Bagging Classifier-WS3

Direct or Remotely sensed

Identifying Confusion from Eye-Tracking Data

Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0) | |

Overfitting and Underfitting

Machine Learning in Practice Lecture 22

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Data Mining, Machine Learning, Data Analysis, etc. scikit-learn

Ensemble learning Reminder - Bagging of Trees Random Forest

Model generalization Brief summary of methods

Analysis for Predicting the Selling Price of Apartments Pratik Nikte

MIS2502: Data Analytics Classification Using Decision Trees

Predicting Loan Defaults

Regression and Clinical prediction models

Protein structure prediction

Introduction to Machine learning

Manisha Panta, Avdesh Mishra, Md Tamjidul Hoque, Joel Atallah

Machine Learning in Business John C. Hull

Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017

Machine Learning for Cyber

Presentation transcript:

Machine Learning to Predict Experimental Protein-Ligand Complexes Hyunji Kim and Sarah Walworth Supervising Faculty: Jun Pei and Lin Song Mentor: Kenny Merz

Background Ligand Protein Protein is a large molecules composed of one or more long chains of amino acids Ligand is any molecule or atom which binds reversibly to a proten https://www.pinterest.com/pin/814447913832516926/ https://www.pinterest.com/pin/814447913832516926/

Main Objective Traditional Random Forest Scoring Functions 60% The overall purpose of our data is to improve the ability to predict native crystal structure General traditional scoring functions have about 60 % accuracy Therefore for our project, we will be using random forest to improve this accuracy

766 Protein-Ligand Crystal Structure Main Objective The overall purpose of our data is to improve the ability to predict native crystal structure General traditional scoring functions have about 60 % accuracy Therefore for our project, we will be using random forest to improve this accuracy 766 Protein-Ligand Crystal Structure

Schrodinger Glide Software Data Generation Insert video here! Schrodinger Glide Software We also applied Garf to approximate potential energy function

Machine Learning: Random Forest RF is created by having a large collection of decision trees Each tree gives a distinct classification and forest will choose the classification having the most votes over all the other trees https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/ https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Machine Learning: Random Forest •Generated using the Skit-Lean tool •5-fold cross validation to prevent overfitting •Randomly select 70% training set and 30% test set. •Each RF decision tree classify ‘0’ if the rank of native ligand is less than the rank of each ligand decoy and classify a ‘1’ if the rank of native ligand is greater than the rank of each ligand decoy. (or use equation) •Outcome: training and testing accuracies https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Machine Learning: Random Forest Confusion Matrix •Generated using the Skit-Lean tool •5-fold cross validation to prevent overfitting •Randomly select 70% training set and 30% test set. •Each RF decision tree classify ‘0’ if the rank of native ligand is less than the rank of each ligand decoy and classify a ‘1’ if the rank of native ligand is greater than the rank of each ligand decoy. (or use equation) •Outcome: training and testing accuracies https://mapr.com/blog/predicting-loan-credit-risk-using-apache-spark-machine-learning-random-forests/

Hyperparameter Tuning: Grid Search The main purpose of hyperparameter tuning is to optimize performance We will “tune” # of decision trees (1000 trees or 2000 trees) and # of features considered for selecting a node (branched in each tree will be adjustified based on the number of trees in the model) With this best optimal setting is to try tune different combinations and evaluate best performance model https://giphy.com/gifs/radio-1fDlDkl86AxVK http://www.clipartpanda.com/categories/tree-clip-art-transparent-background http://www.rivex.co.uk/Online-Manual/RivEX-Online-Manual.html?Rivernetworkspecifications.html https://giphy.com/gifs/radio-1fDlDkl86AxVK http://www.clipartpanda.com/categories/tree-clip-art-transparent-background http://www.rivex.co.uk/Online-Manual/RivEX-Online-Manual.html?Rivernetworkspecifications.html

Scikit-Learn: Grid Search Results

Final Random Forest Parameter

Chemscore ASP Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling https://towardsdatascience.com/how-are-logistic-regression-ordinary-least-squares-regression-related-1deab32d79f5

Astex Statistical Potential (ASP) Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling https://openclipart.org/detail/282566/grin-smiley

Astex Statistical Potential (ASP) Expected number of interactions of atoms in a defined radius Summation over all protein-ligand atom combinations Accounts for physical factors and includes a scaling

Results

Ongoing Work https://www.pinterest.com/pin/499829258622082318/

Applications https://elm.umaryland.edu/tag/new-drug-design/ World Cup 2018: https://www.google.com/

https://tenor.com/search/thank-you-gifs