Download presentation
Presentation is loading. Please wait.
Published byAlannah James Modified over 9 years ago
2
3. Statistics Test results on a drug such as (1) with17 variants (No. of compounds, n, = 17 ) differing only in X are tabulated with the values of physical descriptors such as logP, (logP) 2, , E S obtained from tables. Cmpd NoXActivitylog P(logP) 2 E S 1F1.51.02.00.17 2.4 2 Me 1.7 2.04.00.162.6 : 17OMe3.23.09.00.133.6 (1) What goes into the QSAR equation? logP, (logP) 2, , E S and constant? (P=5) The biological data (Activity), C 0 log P only? (No. of variables, P, = 2, incl const) log P and (logP) 2 ? (P = 3) Whichever gives the best result! How do we chose? +10
3
3. Statistics F 4,12 = 10.9 (Why 4,12? Number of descriptors=4; P - n = 12) correlation coefficient, r, = 0.96 Equation explains 92% of the biological data (r 2 = 0.92 = 0.96 0.96; r 2 > 0.95 fortuitous Is n high enough? Usually need at least 5 compounds per variable. Here we have n=17 - should ideally be 25. If we let P=17, r would equal 1 & we would get a perfect fit, but it would be meaningless What does F mean? Generally, the higher F the better. Whether a given number, like 10.9 is good/bad (high/low) depends on the subscripts. F gives a probability that the whole equation is not random. For a good equation the probability may be say 0.01 or 1% ie. 99% chance equ. Good. These values are given by statistical programs 3.1. What comes out of the QSAR? standard deviation, s, = 0.13 - how well is the data predicted (i.e. what is the error?)
4
3.2. Statistics: comparing equations Equation (5) has a low r E S coefficient also low r - suggests term in E S not significant E S is a steric term that says how large a group is Much better (logP) 2 coefficient is small but term is significant because (logP) 2 big
5
3.3 Statistics: t-test Used to check significance of individual terms/coefficients t for all terms in equations (5)-(7) is ____ _____ ____ ____ If t is low (<~2) term is not significant If t is high (>~2) term is significant Probability that term is not random is usually printed by stats. programs. F is similar but applies to whole equation rather than individual terms 10 t 5 =.02/.02=1; t6=4.98/.99=5.03, t 7,1 =3.4/.1=34; t 7,2 =9.4/.01=940
6
3.4. Statistics summary Use the following to decide whether a QSAR is good or not: r or preferably r 2 is it near 1.0? sis it small? t is it > 2? F is it high enough? value of whole term is it significant? nis it high enough? 10
7
2.4. Some limitations of QSAR (1) , only valid for substituted benzenes - need to consider diverse structures For new groups: , not available in tables Difficult to deal with inactive compounds (what is activity?) or crude biological data (that may be expressed +, ++, etc) Need to synthesise at least 5 compounds per descriptor Descriptors , , etc are often related to each other so increase in activity due to increase in may actually be due to increase in size Does not address conformation of drug Only gives optimum values of , etc, not structure of new drug. Compounds must be described similarly even if structures are very different (using log P instead of can get round the problem - but then log P has to be measured (but does not as it is found in tables) Can prevent researchers looking at a new series of compounds Often have to use response from a single concentration rather than concentration to achieve set effect - therefore loss of accuracy A single QSAR only addresses one property - may need to consider solubility, stability, absorption, metabolism, transport, safety… QSAR only valid if all compounds in series operate by a common mechanism. This is often not valid. Requires accurate data on weakly active compounds
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.