Presentation is loading. Please wait.

Presentation is loading. Please wait.

Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005.

Similar presentations


Presentation on theme: "Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005."— Presentation transcript:

1 Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005

2 D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Motivation Modelling the Dataset Measure of Distance from Domain Validation Overview

3 D C = D D + D M + D A - c © IDBS 2005 What is QSAR? Quantitative Structure-Activity Relationships BiologicalActivity = f ( ChemicalStructure ) + Error Descriptor-based QSAR Descriptors measure chemical structure E.g. topological indices of chemical graph Use Multivariate Linear Regression Regress activity onto high-dimensional descriptor space Problem of extrapolation 3 c =0 3 c =0.289 3 c =0.408 3 c =0.667 3 c =1.802

4 D C = D D + D M + D A - c © IDBS 2005 Motivation QSAR model only valid in domain of its training set Measure membership of this domain of applicability Provides assurance of: External test set k-fold cross validation Prediction ? ?

5 D C = D D + D M + D A - c © IDBS 2005 Bounding Box Convex Hull Distance to Centroid Nearest Neighbour and k-NN Methods Existing Methods ? ?

6 D C = D D + D M + D A - c © IDBS 2005 Use clusters to model the shape of the dataset K-Means algorithm iteratively adjusts partitioning into clusters to increase accuracy of the model Computationally feasible K-Means for Clustering

7

8 D C = D D + D M + D A - c © IDBS 2005 Use the K-Means Model Base on distances to cluster centroids Fuzzy cluster membership Weighted average of distances to cluster centroids, weighted according to cluster membership Computationally efficient Measure of Distance

9 D C = D D + D M + D A - c © IDBS 2005 Contour Plot First contour defines boundary of applicability domain Measure of Distance

10

11

12 D C = D D + D M + D A - c © IDBS 2005 Assess stability of distance measure Use k-fold cross validation Leave out one group at a time Retrain distance measure Mean relative change in distance of compounds left out Internal Validation

13 D C = D D + D M + D A - c © IDBS 2005 Internal Validation MethodAveraged Relative Deviation Bounding Box53.2% Leverage80.5% k-NN83.1% Cluster-based43.2%

14 D C = D D + D M + D A - c © IDBS 2005 External Validation Assess relationship between distance and prediction error Analyse mean-square prediction error over: 50 new compounds Those inside domain Those outside domain

15 D C = D D + D M + D A - c © IDBS 2005 External Validation Mean Square Prediction Error MethodAll (50) Inside Domain Outside Domain Bounding Box2.763.08 (27) 2.40 (23) Leverage2.762.81 (48) 1.61 (2) k-NN2.762.73 (45) 3.11 (5) Cluster-based2.762.70 (46) 3.58 (4)

16 D C = D D + D M + D A - c © IDBS 2005 Need quantitative measure of applicability of a descriptor- based QSAR model to a structure Existing methods are all either too crude or too slow Our new method is computationally efficient, and copes well with non-convex domains Conclusions


Download ppt "Domain of Applicability A Cluster-Based Measure of Domain of Applicability of a QSAR Model Robert Stanforth 6 September 2005."

Similar presentations


Ads by Google