…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set.

Slides:



Advertisements
Similar presentations
Analysis of High-Throughput Screening Data C371 Fall 2004.
Advertisements

1 Sequential Screening S. Stanley Young NISS HTS Workshop October 25, 2002.
Evaluating Classifiers
Lecture 22: Evaluation April 24, 2010.
Model Assessment, Selection and Averaging
Middle Term Exam 03/01 (Thursday), take home, turn in at noon time of 03/02 (Friday)
Lipinski’s rule of five
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Maximum likelihood estimates What are they and why do we care? Relationship to AIC and other model selection criteria.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Statistical Relational Learning for Link Prediction Alexandrin Popescul and Lyle H. Unger Presented by Ron Bjarnason 11 November 2003.
Topic 2: Statistical Concepts and Market Returns
ROC Curves.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Experimental Evaluation
Computer Vision Lecture 8 Performance Evaluation.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Today Evaluation Measures Accuracy Significance Testing
Multiple testing correction
Molecular Descriptors
Fall 2004 TDIDT Learning CS478 - Machine Learning.
Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.
Link Reconstruction from Partial Information Gong Xiaofeng, Li Kun & C. H. Lai
by B. Zadrozny and C. Elkan
Error estimation Data Mining II Year Lluís Belanche Alfredo Vellido.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Chapter Twelve Census: Population canvass - not really a “sample” Asking the entire population Budget Available: A valid factor – how much can we.
Basic Statistics for Engineers. Collection, presentation, interpretation and decision making. Prof. Dudley S. Finch.
Inferring Decision Trees Using the Minimum Description Length Principle J. R. Quinlan and R. L. Rivest Information and Computation 80, , 1989.
Prediction of Malignancy of Ovarian Tumors Using Least Squares Support Vector Machines C. Lu 1, T. Van Gestel 1, J. A. K. Suykens 1, S. Van Huffel 1, I.
Estimating a Population Proportion
Use of Machine Learning in Chemoinformatics Irene Kouskoumvekaki Associate Professor December 12th, 2012 Biological Sequence Analysis course.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Processing of large document collections Part 3 (Evaluation of text classifiers, term selection) Helena Ahonen-Myka Spring 2006.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
Virtual Screening C371 Fall INTRODUCTION Virtual screening – Computational or in silico analog of biological screening –Score, rank, and/or filter.
21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.
1 2 nd Pre-Lab Quiz 3 rd Pre-Lab Quiz 4 th Pre-Lab Quiz.
Evaluating Results of Learning Blaž Zupan
Model Evaluation l Metrics for Performance Evaluation –How to evaluate the performance of a model? l Methods for Performance Evaluation –How to obtain.
EECS 274 Computer Vision Model Fitting. Fitting Choose a parametric object/some objects to represent a set of points Three main questions: –what object.
Selecting Diverse Sets of Compounds C371 Fall 2004.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Machine Learning in Practice Lecture 5 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Evaluating Classification Performance
EEE502 Pattern Recognition
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Reliability a measure is reliable if it gives the same information every time it is used. reliability is assessed by a number – typically a correlation.
Use of Machine Learning in Chemoinformatics
Notes on HW 1 grading I gave full credit as long as you gave a description, confusion matrix, and working code Many people’s descriptions were quite short.
Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Julia Salas CS379a Aim of the Study To determine distinguishing features of orally administered drugs –Physical and structural features probed.
Why does it work? We have not addressed the question of why does this classifier performs well, given that the assumptions are unlikely to be satisfied.
Lipinski’s rule of five
Evaluating Classifiers
Evaluating Results of Learning
Performance Measures II
Advanced Analytics Using Enterprise Miner
Virtual Screening.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Generalizations of Markov model to characterize biological sequences
Fixed, Random and Mixed effects
More on Maxent Env. Variable importance:
Presentation transcript:

…ask more of your data 1 Bayesian Learning Build a model which estimates the likelihood that a given data sample is from a "good" subset of a larger set of samples (classification learning) SciTegic uses modified Naïve Bayesian statistics –Efficient: scales linearly with large data sets –Robust: works for a few as well as many ‘good’ examples –Unsupervised: no tuning parameters needed –Multimodal: can model broad classes of compounds multiple modes of action represented in a single model

…ask more of your data 2 Learn Good from Bad “Learn Good from Bad” examines what distinguishes “good” from “baseline” compounds –Molecular properties (molecular weight, alogp, etc) –Molecular fingerprints Baseline O N A A “Good”

…ask more of your data 3 Learning: “Learn Good From Bad” User provides name for new component and a “Test for good”, e.g.: –Activity > 0.5 –Conclusion EQ ‘CA’ User specifies properties –Typical: fingerprints, alogp, donors/acceptors, number of rotatable bonds, etc. Model is new component Component calculates a number –The larger the number, the more likely a sample is “good”

…ask more of your data 4 Using the model Model can be used to prioritize samples for screening, or search vendor libraries for new candidates for testing Quality of model can be evaluated: –Split data into training and test sets –Build model using training set –Sort test set using model value –Plot how rapidly hits are found in sorted list

…ask more of your data 5 Using a Learned Model Model appears on your tab in LearnedProperties –Drag it into a protocol to use it “by value” –Refer to it by name to use it “by reference”

6 Fingerprints

…ask more of your data 7 ECFP: Extended Connectivity Fingerprints New class of fingerprints for molecular characterization –Each bit represents the presence of a structural (not substructural) feature –4 Billion different bits –Multiple levels of abstraction contained in single FP –Different starting atom codes lead to different fingerprints (ECFP, FCFP,...) –Typical molecule generates 100s s of bits –Typical library generates 100K - 10M different bits.

…ask more of your data 8 Advantages Fast to calculate Represents much larger number of features Features not "pre-selected" Represents tertiary/quaternary information –Opposed to path based fp’s Bits can be “interpreted”

…ask more of your data 9 FCFP: Initial Atom Codes

…ask more of your data 10 ECFP: Generating the Fingerprint Iteration is repeated desired number of times –Each iteration extends the diameter by two bonds Codes from all iterations are collected Duplicate bits may be removed

…ask more of your data 11 ECFP: Extending the Initial Atom Codes Fingerprint bits indicate presence and absence of certain structural features Fingerprints do not depend on a predefined set of substructural features O N A A A A O N A A A A A Each iteration adds bits that represent larger and larger structures Iteration 0 Iteration 1 Iteration 2

…ask more of your data 12 The Statistics Table: Features A feature is a binary attribute of a data record –For molecules, it may be derived from a property range or a fingerprint bit A molecule typically contains a few hundred features A count of each feature is kept: –Over all the samples –Over all samples that pass the test for good The Normalized Probability is log(Laplacian-corrected probability) The normalized probabilities are summed over all features to give the relative score.

…ask more of your data 13 Normalized Probability Given a set of N samples Given that some subset A of them are good (‘active’) –Then we estimate for a new compound: P(good) ~ A / N Given a set of binary features F i –For a given feature F: It appears in N F samples It appears in A F good samples –Can we estimate: P(good | F) ~ A F / N F (Problem: Error gets worse as N F  small)

…ask more of your data 14 Quiz Time Have an HTS screen with 1% actives Have two new samples X and Y to test For each sample, we are given the results from one feature (F X and F Y ) Which one is most likely to be active?

…ask more of your data 15 Question 1 Sample X: –A Fx : 0 –N Fx : 100 Sample Y: –A Fy : 100 –N Fy : 100

…ask more of your data 16 Question 2 Sample X: –A Fx : 0 –N Fx : 100 Sample Y: –A Fy : 1 –N Fy : 100

…ask more of your data 17 Question 3 Sample X: –A Fx : 0 –N Fx : 100 Sample Y: –A Fy : 0 –N Fy : 0

…ask more of your data 18 Question 4 Sample X: –A Fx : 2 –N Fx : 100 Sample Y: –A Fy : 0 –N Fy : 0

…ask more of your data 19 Question 5 Sample X: –A Fx : 2 –N Fx : 4 Sample Y: –A Fy : 200 –N Fy : 400

…ask more of your data 20 Question 6 Sample X: –A Fx : 0 –N Fx : 100 Sample Y: –A Fy : 0 –N Fy : 1,000,000

…ask more of your data 21 Normalized Probability Thought experiment: –What is the probability of a feature which we have seen in NO samples? (i.e., a novel feature) –Hint: assume most features have no connection to the reason for “goodness”…

…ask more of your data 22 Normalized Probability Thought experiment: –What is the probability of a feature which we have seen in NO samples? (i.e., a novel feature) –The best guess would be P(good) Conclusion: –Want estimator P(good | F)  P(good) as N F  small Add some “virtual” samples (with prob P(good)) to every bin

…ask more of your data 23 Normalized Probability Our new estimate (after adding K virtual samples) P’(good | F) = (A F + P(good)K) / (N F + K) –P’(good | F)  P(good) as N F  0 –P’(good | F)  A F / N F as N F  large (If K = 1/P(good) this is the Laplacian correction) K is the duplication factor in our data

…ask more of your data 24 Normalized Probability Final issue: How do I combine multiple features? –Assumption: number of features doesn’t matter –Want to limit contribution from random features P’’’(good | F) = ((A F + P(good)K) / (N F + K)) / P(good) P final = P’’’(good|F 1 ) * P’’’(good|F 2 ) * … Phew! (The good news: for most real-world data, default value of K is quite satisfactory…)

25 Validation of the Model

…ask more of your data 26 Generating Enrichment Plots “If I prioritized my testing using this model, how well would I do?” Graph shows % actives (“good”) found vs % tested Use it on a test dataset: –That was not part of the training data –That you already have results for

…ask more of your data 27 Modeling Known Activity Classes from the World Drug Index Training set 25,000 random selected compounds from WDI Test set 25,000 remaining cmpds from WDI + 25,000 cmpds from Maybridge Descriptors fingerprints, ALogP, molecular properties Build models for each activity class: progestogen, estrogen, etc WDI 50K 25K Maybridge 25K Training setTest set

…ask more of your data 28 Enrichment Plots Apply activity model to compounds in test set Order compounds from ‘best’ to ‘worst’ Plot cumulative distribution of known actives Do this for each activity class actives

…ask more of your data 29 Enrichment Plot for High Actives

…ask more of your data 30 Choosing a Cutoff Value Models are relative predictors –Suggest which to test first –Not a classifier (threshold independent) To make it a classifier, need to choose a cutoff –Balance between sensitivity (True Positive rate) specificity (1 - False Positive rate) –Requires human judgment Two useful views –Histogram plots –ROC (Receiver Operating Characteristic) plots

…ask more of your data 31 Choosing a Cutoff Value: Histograms A histogram can visually show the separation of actives and nonactives using a model

…ask more of your data 32 Choosing a Cutoff Value: ROC Plots Derived from clinical medicine Shows balance of costs of missing a true positive versus falsely accepting a negative Area under the curve is a measure of quality : – = excellent (A) – = good (B) – = fair (C) – = poor (D) – = fail (F)

…ask more of your data 33 ROC Plot for MAO

…ask more of your data 34 Postscript: non-FP Descriptors AlogP –A measure of the octanol/water partition coefficient –High value means molecule "prefers" to be in octanol rather than water – i.e., is nonpolar –A real number Molecular Weight –Total mass of all of the atoms making up the molecule –Units are atomic mass units (a.m.u.) in which the mass of each proton or neutron is approximately 1 –A positive real number

…ask more of your data 35 Postscript: non-FP Descriptors Num H Acceptors, Num H Donors –Molecules may link to each other via hydrogen bonds –H-bonds are weaker than true chemical bonds –H-bonds play a role in drug activity –H donors are polar atoms such as N and O with an attached H (can "donate" a hydrogen to form H-bond) –H acceptors are polar atoms lacking an attached H (can "accept" a hydrogen to form H-bond) –Num H Acceptors, Num H Donors are counts of atoms meeting the above criteria –Non-negative integers

…ask more of your data 36 Postscript: non-FP Descriptors Num Rotatable Bonds –Certain bonds between atoms are rigid Bonds within rings Double and triple bonds –Others are rotatable Attached parts of molecule can freely pivot around bond –Num Rotable Bonds is count of rotatable bonds in molecule –A non-negative integer