3. Statistics Test results on a drug such as (1) with17 variants (No. of compounds, n, = 17 ) differing only in X are tabulated with the values of physical.

Slides:



Advertisements
Similar presentations
6th lecture Modern Methods in Drug Discovery WS10/11 1 More QSAR Problems: Which descriptors to use How to test/validate QSAR equations (continued from.
Advertisements

Lesson 10: Linear Regression and Correlation
Brief introduction on Logistic Regression
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
11 Simple Linear Regression and Correlation CHAPTER OUTLINE
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Objectives (BPS chapter 24)
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
The t-test:. Answers the question: is the difference between the two conditions in my experiment "real" or due to chance? Two versions: (a) “Dependent-means.
Evaluation.
Correlation 2 Computations, and the best fitting line.
SIMPLE LINEAR REGRESSION
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Topic 3: Regression.
Introduction to Probability and Statistics Linear Regression and Correlation.
Estimation 8.
Experimental Evaluation
Business Statistics - QBM117 Least squares regression.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lecture 5: Simple Linear Regression
Business Statistics - QBM117 Statistical inference for regression.
Correlation Question 1 This question asks you to use the Pearson correlation coefficient to measure the association between [educ4] and [empstat]. However,
Chemometrics Method comparison
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Lecture 15 Basics of Regression Analysis
Quantitative Structure-Activity Relationships (QSAR)  Attempts to identify and quantitate physicochemical properties of a drug in relation to its biological.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Linear Regression and Correlation
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Regression Method.
Let’s flip a coin. Making Data-Based Decisions We’re going to flip a coin 10 times. What results do you think we will get?
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
Estimation of Statistical Parameters
Bivariate Regression Analysis The most useful means of discerning causality and significance of variables.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Statistical Analysis Topic – Math skills requirements.
3D- QSAR. QSAR A QSAR is a mathematical relationship between a biological activity of a molecular system and its physicochemical parameters. QSAR attempts.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 13 Multiple Regression
Discussion of time series and panel models
LECTURE 25 THURSDAY, 19 NOVEMBER STA291 Fall
Sampling distributions rule of thumb…. Some important points about sample distributions… If we obtain a sample that meets the rules of thumb, then…
KNR 445 Statistics t-tests Slide 1 Introduction to Hypothesis Testing The z-test.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
ANOVA, Regression and Multiple Regression March
Developing a Hiring System Measuring Applicant Qualifications or Statistics Can Be Your Friend!
Example x y We wish to check for a non zero correlation.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
ChE 551 Lecture 04 Statistical Tests Of Rate Equations 1.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Stats Methods at IC Lecture 3: Regression.
GS/PPAL Section N Research Methods and Information Systems
Lecture Slides Elementary Statistics Thirteenth Edition
Correlation and Regression
Lecture # 2 MATHEMATICAL STATISTICS
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

3. Statistics Test results on a drug such as (1) with17 variants (No. of compounds, n, = 17 ) differing only in X are tabulated with the values of physical descriptors such as logP, (logP) 2, , E S obtained from tables. Cmpd NoXActivitylog P(logP) 2  E S 1F Me : 17OMe (1) What goes into the QSAR equation? logP, (logP) 2, , E S and constant? (P=5) The biological data (Activity), C 0 log P only? (No. of variables, P, = 2, incl const) log P and (logP) 2 ? (P = 3) Whichever gives the best result! How do we chose? +10

3. Statistics F 4,12 = 10.9 (Why 4,12? Number of descriptors=4; P - n = 12) correlation coefficient, r, = 0.96 Equation explains 92% of the biological data (r 2 = 0.92 = 0.96  0.96; r 2 > 0.95 fortuitous Is n high enough? Usually need at least 5 compounds per variable. Here we have n=17 - should ideally be 25. If we let P=17, r would equal 1 & we would get a perfect fit, but it would be meaningless What does F mean? Generally, the higher F the better. Whether a given number, like 10.9 is good/bad (high/low) depends on the subscripts. F gives a probability that the whole equation is not random. For a good equation the probability may be say 0.01 or 1% ie. 99% chance equ. Good. These values are given by statistical programs 3.1. What comes out of the QSAR? standard deviation, s, = how well is the data predicted (i.e. what is the error?)

3.2. Statistics: comparing equations Equation (5) has a low r E S coefficient also low r - suggests term in E S not significant E S is a steric term that says how large a group is Much better (logP) 2 coefficient is small but term is significant because (logP) 2 big

3.3 Statistics: t-test Used to check significance of individual terms/coefficients t for all terms in equations (5)-(7) is ____ _____ ____ ____ If t is low (<~2) term is not significant If t is high (>~2) term is significant Probability that term is not random is usually printed by stats. programs. F is similar but applies to whole equation rather than individual terms 10 t 5 =.02/.02=1; t6=4.98/.99=5.03, t 7,1 =3.4/.1=34; t 7,2 =9.4/.01=940

3.4. Statistics summary Use the following to decide whether a QSAR is good or not: r or preferably r 2 is it near 1.0? sis it small? t is it > 2? F is it high enough? value of whole term is it significant? nis it high enough? 10

2.4. Some limitations of QSAR (1) ,  only valid for substituted benzenes - need to consider diverse structures For new groups: ,  not available in tables Difficult to deal with inactive compounds (what is activity?) or crude biological data (that may be expressed +, ++, etc) Need to synthesise at least 5 compounds per descriptor Descriptors , , etc are often related to each other so increase in activity due to increase in  may actually be due to increase in size Does not address conformation of drug Only gives optimum values of ,  etc, not structure of new drug. Compounds must be described similarly even if structures are very different (using log P instead of  can get round the problem - but then log P has to be measured (but  does not as it is found in tables) Can prevent researchers looking at a new series of compounds Often have to use response from a single concentration rather than concentration to achieve set effect - therefore loss of accuracy A single QSAR only addresses one property - may need to consider solubility, stability, absorption, metabolism, transport, safety… QSAR only valid if all compounds in series operate by a common mechanism. This is often not valid. Requires accurate data on weakly active compounds