Scientific Practice Regression.

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Multicollinearity.
Kin 304 Regression Linear Regression Least Sum of Squares
Correlation and regression
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Summarizing Bivariate Data Introduction to Linear Regression.
Stat 217 – Day 26 Regression, cont.. Last Time – Two quantitative variables Graphical summary  Scatterplot: direction, form (linear?), strength Numerical.
Linear Regression MARE 250 Dr. Jason Turner.
REGRESSION AND CORRELATION
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Linear Regression and Linear Prediction Predicting the score on one variable.
Correlation 1. Correlation - degree to which variables are associated or covary. (Changes in the value of one tends to be associated with changes in the.
1 Chapter 17: Introduction to Regression. 2 Introduction to Linear Regression The Pearson correlation measures the degree to which a set of data points.
Simple Linear Regression Analysis
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Chapter 11 Simple Regression
Correlation and Regression David Young Department of Statistics and Modelling Science, University of Strathclyde Royal Hospital for Sick Children, Yorkhill.
ASSOCIATION BETWEEN INTERVAL-RATIO VARIABLES
Simple Linear Regression. Correlation Correlation (  ) measures the strength of the linear relationship between two sets of data (X,Y). The value for.
The Scientific Method Interpreting Data — Correlation and Regression Analysis.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Agenda Review Association for Nominal/Ordinal Data –  2 Based Measures, PRE measures Introduce Association Measures for I-R data –Regression, Pearson’s.
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.
Introduction to Linear Regression
Section 5.2: Linear Regression: Fitting a Line to Bivariate Data.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Causality and confounding variables Scientists aspire to measure cause and effect Correlation does not imply causality. Hume: contiguity + order (cause.
Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.
Regression Lesson 11. The General Linear Model n Relationship b/n predictor & outcome variables form straight line l Correlation, regression, t-tests,
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Correlation & Regression Analysis
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Example x y We wish to check for a non zero correlation.
STATISTICS 12.0 Correlation and Linear Regression “Correlation and Linear Regression -”Causal Forecasting Method.
Inference for regression - More details about simple linear regression IPS chapter 10.2 © 2006 W.H. Freeman and Company.
V. Rouillard  Introduction to measurement and statistical analysis CURVE FITTING In graphical form, drawing a line (curve) of best fit through.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
Introduction Many problems in Engineering, Management, Health Sciences and other Sciences involve exploring the relationships between two or more variables.
Descriptive measures of the degree of linear association R-squared and correlation.
Scientific Practice Non-linear and Multiple
Lecture #25 Tuesday, November 15, 2016 Textbook: 14.1 and 14.3
23. Inference for regression
The simple linear regression model and parameter estimation
Correlation Scientific
Chapter 20 Linear and Multiple Regression
Regression and Correlation
Regression Scientific
Linear Regression Special Topics.
A little VOCAB.
Correlation and Regression
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Linear Regression Prof. Andy Field.
Understanding Standards Event Higher Statistics Award
Relationship with one independent variable
Chapter 13 Simple Linear Regression
Scientific Practice Correlation.
Non-linear and Multiple Regression
Regression Analysis PhD Course.
The Practice of Statistics in the Life Sciences Fourth Edition
Residuals, Residual Plots, and Influential points
Residuals, Residual Plots, & Influential points
M248: Analyzing data Block D UNIT D2 Regression.
Product moment correlation
CHAPTER 3 Describing Relationships
Correlation and Regression Lecture 1 Sections: 10.1 – 10.2
Algebra Review The equation of a straight line y = mx + b
Chapter Thirteen McGraw-Hill/Irwin
Chapter 13 Simple Linear Regression
Presentation transcript:

Scientific Practice Regression

Where We Are/Where We Are Going We have looked at how correlation shows how two things might be associated eg between arm length and leg length no causality implied ie leg length is not responsible for arm length! The correlation coefficient, r, assumes a linear ‘fit’ between the data and reflects how far away from that ‘line of best’ fit the data lie Regression takes this one step further describes fit mathematically generally implies a causal link; eg increased BP causes increased mortality

Independent/Dependent Variables As a causal link is implied, then… the independent variable is the thing doing the influencing the dependent variable is the one being influenced By convention, if we were to plot this graphically… the x-axis represents the independent variable the y-axis represents the dependent variable eg… blood pressure on the x-axis cardiovascular mortality on the y-axis

The Equation of the Straight Line Any linear relationship can be described as… y = mx + c y can be calculated (predicted) for any x using… c, the intercept where the line crosses the y-axis when x=0 m, the slope of the line change in y per x non-zero if relationship

The Line of Best Fit For a given set of data, linear regression derives the straight line equation that best describes the data by minimising the overall distance of data points to that straight line

The Power of Linear Regression Eg, data from a practical class looked at relationship between latency of the Achilles Heel stretch reflex (ms) and height (cm) the taller the person, the longer the nerve pathway mediating the reflex

The Power of Linear Regression The class results (n=85)

The Power of Linear Regression The line of best fit : y = 0.1337x + 12.225 the non-zero slope suggests a relationship

Testing the Line of Best Fit But even if we used random data, the line of best fit would have a non-zero slope! so how do we know if the slope is significantly different to zero? The reported slope is the best estimate based on data that doesn’t perfectly fit a straight line when we use a collection of data to estimate a single value, then that estimate comes with a standard error eg data  mean +/- SE of the mean in our case data  slope +/- SE of the slope we can predict the 95% CI of the slope! does the 95% CI encompass zero?

The Power of Linear Regression In reality, Minitab will do the significance calculation for us a t-test on the slope and its SE Null Hypo is that slope is zero The regression equation is latency = 12.2 + 0.134 height   Predictor Coef SE Coef T P Constant 12.225 7.072 1.73 0.088 height 0.13366 0.04144 3.23 0.002 S = 4.16466 R-Sq = 11.1% R-Sq(adj) = 10.1% p < 0.05, so the slope is sig diff to zero reflex latency increases with height (0.134 ms/cm)

The Power of Linear Regression The intercept also reported as a non-zero value is it significantly different to zero? (the Null Hypo) a t-test on the intercept and its SE The regression equation is latency = 12.2 + 0.134 height   Predictor Coef SE Coef T P Constant 12.225 7.072 1.73 0.088 height 0.13366 0.04144 3.23 0.002 S = 4.16466 R-Sq = 11.1% R-Sq(adj) = 10.1% p > 0.05, so the intercept not sig diff to zero predictive equation is latency=0.134 height

The Power of Linear Regression The analysis also yields r-squared this is the square of the correlation coefficient proportion of variation in y-axis variable that can be explained by variation in x-axis variable The regression equation is latency = 12.2 + 0.134 height   Predictor Coef SE Coef T P Constant 12.225 7.072 1.73 0.088 height 0.13366 0.04144 3.23 0.002 S = 4.16466 R-Sq = 11.1% R-Sq(adj) = 10.1% at 10%, this is very low but it is still highly significant!

R-squared and Significance Just because something has a low r-squared does not mean it is not significant means it has a low predictive power Eg the more clothes worn, the heavier is a person’s weight clothes significantly influence weight But it can only account for a small amount of variation in weight that we see ie r-squared is small

Summary Linear regression extends correlation by reporting the mathematical ‘line of best fit’ y = mx + c The slope needs to be tested statistically to ‘prove’ the relationship is ‘real’ eg the Null Hypo is that m = 0 The intercept should also be tested to see if it is non-zero The equation of the ‘line of best fit’ is predictive ie given a value of x, you can predict y the usefulness of this depends on r-squared proportion of variation in y ‘explained’ by x (0-100%)