Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.

Slides:



Advertisements
Similar presentations
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
Advertisements

11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lesson 10: Linear Regression and Correlation
13- 1 Chapter Thirteen McGraw-Hill/Irwin © 2005 The McGraw-Hill Companies, Inc., All Rights Reserved.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Module 3. Regression Regression is the attempt to explain the variation in a dependent variable using the variation in independent.
Learning Objectives Copyright © 2002 South-Western/Thomson Learning Data Analysis: Bivariate Correlation and Regression CHAPTER sixteen.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Learning Objectives Copyright © 2004 John Wiley & Sons, Inc. Bivariate Correlation and Regression CHAPTER Thirteen.
Objectives (BPS chapter 24)
Simple Linear Regression
Simple Linear Regression 1. Correlation indicates the magnitude and direction of the linear relationship between two variables. Linear Regression: variable.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Standard Error of the Estimate Goodness of Fit Coefficient of Determination Regression Coefficients.
Bivariate Regression CJ 526 Statistical Analysis in Criminal Justice.
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Topics: Regression Simple Linear Regression: one dependent variable and one independent variable Multiple Regression: one dependent variable and two or.
Ch 11: Correlations (pt. 2) and Ch 12: Regression (pt.1) Nov. 13, 2014.
SIMPLE LINEAR REGRESSION
Lecture 11 Chapter 6. Correlation and Linear Regression.
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Lecture 5: Simple Linear Regression
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Correlation and Regression Analysis
Simple Linear Regression Analysis
Linear Regression.  Uses correlations  Predicts value of one variable from the value of another  ***computes UKNOWN outcomes from present, known outcomes.
Relationships Among Variables
Example of Simple and Multiple Regression
Correlation and Regression
Lecture 15 Basics of Regression Analysis
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Statistics for Business and Economics 8 th Edition Chapter 11 Simple Regression Copyright © 2013 Pearson Education, Inc. Publishing as Prentice Hall Ch.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Chapter 6 & 7 Linear Regression & Correlation
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
Linear Trend Lines = b 0 + b 1 X t Where is the dependent variable being forecasted X t is the independent variable being used to explain Y. In Linear.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 13-1 Introduction to Regression Analysis Regression analysis is used.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
STA291 Statistical Methods Lecture LINEar Association o r measures “closeness” of data to the “best” line. What line is that? And best in what terms.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
Regression Analysis. 1. To comprehend the nature of correlation analysis. 2. To understand bivariate regression analysis. 3. To become aware of the coefficient.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
رگرسیون چندگانه Multiple Regression
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
Chapter 20 Linear and Multiple Regression
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Correlation and Simple Linear Regression
Inference for Regression
Multiple Regression.
Linear Regression Prof. Andy Field.
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
Correlation and Simple Linear Regression
Simple Linear Regression and Correlation
Introduction to Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control

Regression/Prediction In this case we can use X to predict Y with accuracy. The equation is Y = 2X +1 For any X we can compute Y Error is not a factor

In this case we can’t fit a nice straight line to the data. If we repeat the experiment we will get somewhat different results Random error is a factor

If in this case we know that the relationship should be a straight line and we believe the deviation from a line is due to error We can fit a line to the data There are many possible lines to choose from. How do we select the best line?

Principles of Prediction If we don’t know anything about the person being measured, our best bet is always to predict the mean If we know the person is average on X we should predict they will be average on Y. We want to select the line that produces the least error. X – ScaleY – Scale Mean = 3.0Mean = 7.3

Development Sample – A sample in which both X and Y are known – Used to develop an equation that can be used to compute a predicted Y from X – Used to Compute the Standard Curve Unknown Sample – Use the equation developed above to predict Y for people or samples for whom we know X but not Y X – ScaleY – Scale Mean = 3.0Mean = 7.3

Regression Freddie Bruflot X = 5 Y = 10 Actual Y > Predicted Y > Mean of Y > Total Residual Regression Error.75

Computation of SSE XYY predictedResidual (error)Residual squared Mean = 3Mean = 7.3Sum of Squared Residuals =

Correlation/Regression Example High levels of a particular factor in blood samples (call it BF-Costly)is known to be highly predictive of cervical cancer. Measuring this specific factor is so expensive and time consuming that it is impractical. The following data are obtained concerning the relationship between a second, easily measured blood factor (call it BF-Cheap) and BF-Costly.

BF-CheapBF-Costly Mean = 71.13Mean = 72.60

Here is a scatterplot of the data showing the relationship between BF-Cheap and BF-Costly. It looks like they are correlated. If it is significant this might be worth pursuing. Null hypothesis: correlation is zero Alpha =.05

Test Significance of the Correlation The probability that the correlation is zero is (7.24E-05) We can reject the null hypothesis It may well be worth it to develop a prediction equation that can be used to predict BF- Costly from BF-Cheap Correlations BF - CheapBF - Costly BF - Cheap Pearson Correlation Sig. (2- tailed).7.24E-05 N 15 BF - Costly Pearson Correlation Sig. (2- tailed) 7.24E-05. N 15 **Correlation is significant at the 0.01 level (2-tailed).

Regression Analysis Null Hypothesis: the slope of the regression line is zero Alpha =.05 Probability is We can reject the null hypothesis What is the equation? Is it any good? ANOVA Model Sum of Squaresdf Mean SquareFSig. 1Regression E-05 Residual Total aPredictors: (Constant), BF - Cheap bDependent Variable: BF - Costly

The equation we are looking for will be of the form: The Y – Intercept is called “constant” and is 27.6 The slope (the number you multiply BF-Cheap by) is.63 Both of these are significant The equation is: Coefficients Unstandardized Coefficients Standardized Coefficients Model BStd. ErrorBetatSig. 1(Constant) BF - Cheap E-05 aDependent Variable: BF - Costly Predicted BF-Costly = slope * BF-Cheap + y-intercept Predicted BF-Costly =.63(BF-Cheap)

How good is the equation. What kind of accuracy can we expect. R-Square (.715) is the proportion of the total variance accounted for by the equation. About 71.5% of the variance is accounted for That means about 28.5% is not accounted for Model Summary ModelRR Square Adjusted R Square Std. Error of the Estimate a Predictors: (Constant), BF - Cheap b Dependent Variable: BF - Costly

Using Linear Regression to Develop a Standard Curve for Real Time PCR Develop a standard curve from samples with known concentration. – Y is the concentration – X is the CP Both X and Y are known The relationship between concentration and output is not linear it is an S-curve but the relationship between the log(concentration) and output is linear.

Standard Sample Data CPConcentrationLog Concentration

CP and Concentration are highly correlated. It is a negative correlation. The correlation is Now fit a regression line to these data and get the equation of the line.

Fit Regression Line This is highly significant of course Y-Intercept is 8.11 Slope is -.22 Equation is: Predicted Concentration = antilog (-.22(CP) ) Coefficients Unstandardized Coefficients Standardized CoefficientstSig. ModelBStd. ErrorBeta 1(Constant) E-14 CP E-11 aDependent Variable: Log Concentration

Standard Curve For an Unknown with a CP of Log Predicted Concentration would be (18.48) = 4.04 Antilog of 4.04 is 1.10E4