Presentation is loading. Please wait.

Presentation is loading. Please wait.

AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &

Similar presentations


Presentation on theme: "AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &"— Presentation transcript:

1 AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter & Gamble, Brussels, Belgium (1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria Abstract There is a practical need for an automatic (computerized) procedure to find out the application domain of a QSAR model. In this paper we attempt to address this need and focus on defining the application domain of group contribution methods. These methods are characterized by high number of descriptors i.e. high dimensionality. For feasibility reasons we propose to estimate the application domain as the parameter space, bounded by the training set parameter ranges. Then, we demonstrate how to practically apply this approach using the Syracuse Research Corporation KOWWIN model as an example. Discussion Atom Fragment Contribution (AFC) method Uses counts of fragments as descriptors; Uses very simple fragments (each non- hydrogen atom is a core for fragment; this minimizes the possibility of missing fragments); In addition to simple fragments uses correction (these are complex fragments always larger than a single atom) Two-stage multivariate regression KOWWIN training set and validation set were provided by Syracuse Research Corp. Approach Approximate application domain by ranges determined from the training set: Fragment and correction factors range Log Kow range because the combination of fragments is out of range Analyse KOWWIN training set and obtain fragment and correction factor statistics for training and validation sets Compare training and validation set of KOWWIN model  The AFC method is representative of group contribution methods, which have two inherent fundamental assumptions: Additivity - implies that each of the structural components of a compound makes a separate and additive contribution to the property of interest for the compound. Additivity is widely agreed hypothesis, with supporting evidence from empirical studies and contemporary quantum theories. Transferability - assumes that these contributions are the same across a wide variety of compounds. The property of a single compound is modelled as a sum of the contributions associated with an atom or fragment (additivity) assuming that the contributions of the identical atoms or fragments are the same as that in the original compounds used to develop these contributions (transferability). Assumptions failures examples: molecules where the same fragment occurs many times in a molecule (e.g. a long aliphatic chain) – additivity exceeded beyond training set. molecules with “uncommon” functional groups because transferability is difficult to establish because of poor statistics.  Complex structures are not always sufficiently represented, because the AFC method uses very simple fragments (e.g. compounds with large aliphatic rings are treated like aliphatic chains). fi - the coefficient for each fragment; ni - the number of times the fragment occurs in the structure; cj - the coefficient for each correction factor; nj - the number of times, the correction factor occurs or is applied in the structure SMILES : Oc(c(cc(c1)Cc(cc(c(O)c2C(C)(C)C)C(C)(C)C)c2)C(C)(C)C)c1C(C)(C)C CHEM : Phenol, 4,4'-methylenebis 2,6-bis(1,1-dimethylethyl)- MOL FOR: C29 H44 O2 MOL WT : 424.67 -------+-----+--------------------------------------------+---------+-------- TYPE | NUM | LOGKOW FRAGMENT DESCRIPTION | COEFF | VALUE -------+-----+--------------------------------------------+---------+-------- Frag | 12 | -CH3 [aliphatic carbon] | 0.5473 | 6.5676 Frag | 1 | -CH2- [aliphatic carbon] | 0.4911 | 0.4911 Frag | 12 | Aromatic Carbon | 0.2940 | 3.5280 Frag | 2 | -OH [hydroxy, aromatic attach] |-0.4802 | -0.9604 Frag | 4 | -tert Carbon [3 or more carbon attach] | 0.2676 | 1.0704 Factor| 1 | -CH2- (aliphatic), 2 phenyl attach correc |-0.2326 | -0.2326 Factor| 2 | Ring rx: -OH / di-ortho;sec- or t- carbon |-0.8500 | -1.7000 Const | | Equation Constant | | 0.2290 -------+-----+--------------------------------------------+---------+-------- Log Kow = 8.9931 Methods Data FragmentKOWWIN Training setValidation set FrequencyMINMAXFrequencyMinMax Aromatic Carbon1786 (73%) 2248725 (80%)130 CH3[aliphatic carbon]1388 (57%) 1137353 (67%)120 CH2[aliphatic carbon]1076 (44%) 1187016 (64%)128 CH[aliphatic carbon]457 (18%)1163839 (35%)123 C[aliphatic carbon-No H not tert]229 (9%)131343 (12%)111 O[oxygen aliphatic attach]108 (4%)151231 (11%)112 F[fluorine aliphatic attach]103 (4%)16540 (5%)123 Cl[chlorine aliphatic attach]100 (4%)16354 (3%)112 Analyzed Data SetsNo. Compounds No. Fragments No. Correction factors Experimental Log KOW range KOWWIN Training set2434186322-4.57, 8.19 KOWWIN Validation set 10901172316-4.99, 11.71 A software was developed in order to read the full text output of SRC KOWWIN program and extract the fragment and correction factor statistics of training and validation set Application domain and prediction error Number of compoundsTraining setValidation set All243410901 In-domain243410259 Out-of-domain0651 The average prediction error outside application domain defined by the training set ranges is twice larger than the prediction error inside the domain. Note that it is true only on average, i.e. there are many individual compounds with low error outside of the domain, as well as individual compounds with high error inside the domain. The training space as defined by fragment and correction factor ranges consists of 5.44E+41 unique points. Of this enormous space the training set uses only 2113 unique points (some of the 2434 points coincide). This means only 3.88E-37 % of the training space is covered by the training set points! Given good practical experience with the model means that additivity assumption is working within the training set space. These observations support the view that to determine the applicability of a (QSAR) model it is essential to evaluate the model assumptions. An excerpt from the 508 fragment list for the KOWWIN and its representation in training and validation sets Overlay between training and validation sets SRC KOWWIN full text output


Download ppt "AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELS Nina Jeliazkova 1 Joanna Jaworska 2, (2) Central Product Safety, Procter &"

Similar presentations


Ads by Google