Simple linear regression Tron Anders Moger 4.10.2006.

Slides:



Advertisements
Similar presentations
Chapter 12 Simple Linear Regression
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Hypothesis Testing Steps in Hypothesis Testing:
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Regression Analysis Simple Regression. y = mx + b y = a + bx.
Objectives (BPS chapter 24)
Chapter 12 Simple Linear Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
Introduction to Probability and Statistics Linear Regression and Correlation.
SIMPLE LINEAR REGRESSION
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Correlation and Regression Analysis
Simple Linear Regression and Correlation
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Lecture 5 Correlation and Regression
Correlation & Regression
Regression and Correlation Methods Judy Zhong Ph.D.
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Regression Analysis Regression analysis is a statistical technique that is very useful for exploring the relationships between two or more variables (one.
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Inferences in Regression and Correlation Analysis Ayona Chatterjee Spring 2008 Math 4803/5803.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
INTRODUCTORY LINEAR REGRESSION SIMPLE LINEAR REGRESSION - Curve fitting - Inferences about estimated parameter - Adequacy of the models - Linear.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Chapter 10 Correlation and Regression
Examining Relationships in Quantitative Research
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
CORRELATION: Correlation analysis Correlation analysis is used to measure the strength of association (linear relationship) between two quantitative variables.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
STA 286 week 131 Inference for the Regression Coefficient Recall, b 0 and b 1 are the estimates of the slope β 1 and intercept β 0 of population regression.
Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 12 Simple Linear Regression n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n Testing.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Chapter 11 Linear Regression and Correlation. Explanatory and Response Variables are Numeric Relationship between the mean of the response variable and.
The simple linear regression model and parameter estimation
Regression and Correlation
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Correlation and Simple Linear Regression
Chapter 11 Simple Regression
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
PENGOLAHAN DAN PENYAJIAN
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Linear Regression and Correlation
SIMPLE LINEAR REGRESSION
Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Simple linear regression Tron Anders Moger

Repetition: Testing: –Identify data; continuous->t-tests; proportions- >Normal approx. to binomial dist. –If continous: one-sample, matched pairs, two independent samples? –Assumptions: Are data normally distributed? If two ind. samples, equal variances in both groups? –Formulate H 0 and H 1 (H 0 is always no difference, no effect of treatment etc.), choose sig. level (α=5%) –Calculate test statistic

Inference: Test statistic usually standardized; (mean-expected value)/(estimated standard error) Gives you a location on the x-axis in a distribution Compare this value to the value at the 2.5%-percentile and 97.5%-percentile of the distribution If smaller than the 2.5%-percentile or larger than the 97.5%-percentile, reject H 0 P-value: Area in the tails of the distribution below value of test statistic+area above value of test-statistic If smaller than 0.05, reject H 0 If confidence interval for mean or mean difference (depends on test what you use) does not include H 0, reject H 0

Last week: Looked at continuous, normally distributed variables Used t-tests to see if there was significant difference between means in two groups How strong is the relationship between two such variables? Correlation What if one wants to study the relationship between several such variables? Linear regression

Connection between variables We would like to study connection between x and y!

Data from the first obligatory assignment: Birth weight and smoking Children of 189 women Low birth weight is a medical risk factor Does mother’s smoking status have any influence on the birth weight? Also interested in relationship with other variables: Mother’s age, mother’s weight, high blood pressure, ethincity etc.

Is birth weight normally distributed? From explore in SPSS

Q-Q plot (check Normality plots with tests under plots):

Tests for normality: Tests of Normality Kolmogorov-Smirnov(a)Shapiro-Wilk StatisticdfSig.StatisticdfSig. birthweight,043189,200(*),992189,438 * This is a lower bound of the true significance. a Lilliefors Significance Correction The null hypothesis is that the data are normal. Large p- value indicates normal distribution. For large samples, the p-value tends to be low. The graphical methods are more important

Pearsons correlation coefficient r Measures the linear relationship between variables r=1: All data lie on an increasing straight line r=-1: All data lie on a decreasing straight line r=0: No linear relationship In linear regression, often use R 2 (r 2 ) as a meansure of the explanatory power of the model R 2 close to 1 means that the observations are close to the line, r 2 close to 0 means that there is no linear relationship between the observations

Testing for correlation It is also possible to test whether a sample correlation r is large enough to indicate a nonzero population correlation Test statistic: Note: The test only works for normal distributions and linear correlations: Always also investigate scatter plot!

Pearsons correlation coefficient in SPSS: Analyze->Correlate->bivariate Check Pearson Tests if r is significantly different from 0 Null hypothesis is that r=0 The variables have to be normally distributed Independence between observations

Example:

Correlation from SPSS:

If the data are not normally distributed: Spearmans rank correlation, r s Measures all monotonous relationships, not only linear ones No distribution assumptions r s is between -1 and 1, similar to Pearsons correlation coefficient In SPSS: Analyze->Correlate->bivariate Check Spearman Also provides a test on whether r s is different from 0

Spearman correlation:

Linear regression Wish to fit a line as close to the observed data (two normally distributed varaibles) as possible Example: Birth weight=a+b*mother’s weight In SPSS: Analyze->Regression->Linear Click Statistics and check Confidence interval for B Choose one variable as dependent (Birth weight) as dependent, and one variable (mother’s weight) as independent Important to know which variable is your dependent variable!

Connection between variables Fit a line!

The standard simple regression model We define a model where are independent, normally distributed, with equal variance We can then use data to estimate the model parameters, and to make statements about their uncertainty

What can you do with a fitted line? Interpolation Extrapolation (sometimes dangerous!) Interpret the parameters of the line

How to define the line that ”fits best”? Note: Many other ways to fit the line can be imagined The sum of the squares of the ”errors” minimized = Least squares method!

How to compute the line fit with the least squares method? Let (x 1, y 1 ), (x 2, y 2 ),...,(x n, y n ) denote the points in the plane. Find a and b so that y=a+bx fit the points by minimizing Solution: where and all sums are done for i=1,...,n.

How do you get this answer? Differentiate S with respect to a og b, and set the result to 0 We get: This is two equations with two unknowns, and the solution of these give the answer.

y against x ≠ x against y Linear regression of y against x does not give the same result as the opposite. Regression of x against y Regression of y against x

Anaylzing the variance Define –SSE: Error sum of squares –SSR: Regression sum of squares –SST: Total sum of squares We can show that SST = SSR + SSE Define R 2 is the ”coefficient of determination”

Assumptions Usually check that the dependent variable is normally distributed More formally, the residuals, i.e. the distance from each observation to the line, should be normally distributed In SPSS: –In linear regression, click Statistics. Under residuals check casewise diagnostics, and you will get ”outliers” larger than 3 or less than -3 in a separate table. –In linear regression, also click Plots. Under standardized residuals plots, check Histogram and Normal probability plot. Choose *Zresid as y-variable and *Zpred as x-variable

Example: Regression of birth weight with mother’s weight as independent variable

Residuals:

Check of assumptions:

Check of assumptions cont’d:

Interpretation: Have fitted the line Birth weight= *mother’s weight If mother’s weight increases by 20 pounds, what is the predicted impact on infant’s birth weight? 4.429*20=89 grams What’s the predicted birth weight of an infant with a 150 pound mother? *150=3034 grams

Influence of extreme observations NOTE: The result of a regression analysis is very much influenced by points with extreme values, in either the x or the y direction. Always investigate visually, and determine if outliers are actually erroneous observations

But how to answer questions like: Given that a positive slope (b) has been estimated: Does it give a reproducible indication that there is a positive trend, or is it a result of random variation? What is a confidence interval for the estimated slope? What is the prediction, with uncertainty, at a new x value?

Confidence intervals for simple regression In a simple regression model, –a estimates –b estimates – estimates Also, where estimates variance of b So a confidence interval for is given by

Hypothesis testing for simple regression Choose hypotheses: Test statistic: Reject H 0 if or