Download presentation
Presentation is loading. Please wait.
Published byHannah Lawrence Modified over 10 years ago
1
Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005
2
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Motivation Modelling the Dataset Measure of Distance from Domain Validation Overview
3
D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Quantitative Structure-Activity Relationships BiologicalActivity = f ( ChemicalStructure ) + Error Descriptor-based QSAR Descriptors measure chemical structure E.g. topological indices of chemical graph Use Multivariate Linear Regression Regress activity onto high-dimensional descriptor space Problem of extrapolation 3 c =0 3 c =0.289 3 c =0.408 3 c =0.667 3 c =1.802
4
D C = D D + D M + D A - c © IDBS 2005 Motivation QSAR model only valid in domain of its training set Measure membership of this domain of applicability Provides assurance of: External test set k-fold cross validation Prediction ? ?
5
D C = D D + D M + D A - c © IDBS 2005 Bounding Box Convex Hull Distance to Centroid Nearest Neighbour and k-NN Methods Existing Methods ? ?
6
D C = D D + D M + D A - c © IDBS 2005 Use clusters to model the shape of the dataset K-Means algorithm iteratively adjusts partitioning into clusters to increase accuracy of the model Computationally feasible K-Means for Clustering
8
D C = D D + D M + D A - c © IDBS 2005 Use the K-Means Model Base on distances to cluster centroids Fuzzy cluster membership Weighted average of distances to cluster centroids, weighted according to cluster membership Computationally efficient Measure of Distance
9
D C = D D + D M + D A - c © IDBS 2005 Contour Plot First contour defines boundary of applicability domain Measure of Distance
12
D C = D D + D M + D A - c © IDBS 2005 Assess stability of distance measure Use k-fold cross validation Leave out one group at a time Retrain distance measure Mean relative change in distance of compounds left out Internal Validation
13
D C = D D + D M + D A - c © IDBS 2005 Internal Validation MethodAveraged Relative Deviation Bounding Box53.2% Leverage80.5% k-NN83.1% Cluster-based43.2%
14
D C = D D + D M + D A - c © IDBS 2005 External Validation Assess relationship between distance and prediction error Analyse mean-square prediction error over: 50 new compounds Those inside domain Those outside domain
15
D C = D D + D M + D A - c © IDBS 2005 External Validation Mean Square Prediction Error MethodAll (50) Inside Domain Outside Domain Bounding Box2.763.08 (27) 2.40 (23) Leverage2.762.81 (48) 1.61 (2) k-NN2.762.73 (45) 3.11 (5) Cluster-based2.762.70 (46) 3.58 (4)
16
D C = D D + D M + D A - c © IDBS 2005 Need quantitative measure of applicability of a descriptor- based QSAR model to a structure Existing methods are all either too crude or too slow Our new method is computationally efficient, and copes well with non-convex domains Conclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.