Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains.

Slides:



Advertisements
Similar presentations
Chapter 3 Examining Relationships Lindsey Van Cleave AP Statistics September 24, 2006.
Advertisements

Scatterplots and Correlation
Correlation and Linear Regression.
Correlation and Regression By Walden University Statsupport Team March 2011.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Chapter 6: Exploring Data: Relationships Lesson Plan
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
AP Statistics Chapters 3 & 4 Measuring Relationships Between 2 Variables.
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Statistics for the Social Sciences
Chapter 12 Simple Regression
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.
Chapter 12a Simple Linear Regression
The Simple Regression Model
Nemours Biomedical Research Statistics April 2, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
7/2/ Lecture 51 STATS 330: Lecture 5. 7/2/ Lecture 52 Tutorials  These will cover computing details  Held in basement floor tutorial lab,
Correlation and Regression Analysis
Linear Regression/Correlation
Simple Linear Regression
Correlation & Regression
Advantages of Multivariate Analysis Close resemblance to how the researcher thinks. Close resemblance to how the researcher thinks. Easy visualisation.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Relationship of two variables
Chapter 6: Exploring Data: Relationships Chi-Kwong Li Displaying Relationships: Scatterplots Regression Lines Correlation Least-Squares Regression Interpreting.
Chapter 3: Examining relationships between Data
1 Chapter 3: Examining Relationships 3.1Scatterplots 3.2Correlation 3.3Least-Squares Regression.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
1 1 Slide © 2004 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
Chapter 3 Section 3.1 Examining Relationships. Continue to ask the preliminary questions familiar from Chapter 1 and 2 What individuals do the data describe?
Chapters 8 & 9 Linear Regression & Regression Wisdom.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Tutorial 4 MBP 1010 Kevin Brown. Correlation Review Pearson’s correlation coefficient – Varies between – 1 (perfect negative linear correlation) and 1.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Copyright © 2010 Pearson Education, Inc Chapter Seventeen Correlation and Regression.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Linear Models Alan Lee Sample presentation for STATS 760.
SIMPLE LINEAR REGRESSION AND CORRELLATION
Copyright © 2004 by The McGraw-Hill Companies, Inc. All rights reserved.
Stat 1510: Statistical Thinking and Concepts REGRESSION.
Lecture 10 Introduction to Linear Regression and Correlation Analysis.
Tutorial 5 Thursday February 14 MBP 1010 Kevin Brown.
© The McGraw-Hill Companies, Inc., Chapter 10 Correlation and Regression.
Nemours Biomedical Research Statistics April 9, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
BUSINESS MATHEMATICS & STATISTICS. Module 6 Correlation ( Lecture 28-29) Line Fitting ( Lectures 30-31) Time Series and Exponential Smoothing ( Lectures.
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Week 2 Normal Distributions, Scatter Plots, Regression and Random.
The simple linear regression model and parameter estimation
Lecture 11: Simple Linear Regression
Regression Analysis AGEC 784.
Unit 4 LSRL.
LSRL.
Least Squares Regression Line.
Correlation and Simple Linear Regression
Correlation and regression
Chapter 3.2 LSRL.
Correlation and Simple Linear Regression
Least Squares Regression Line LSRL Chapter 7-continued
Correlation and Simple Linear Regression
M248: Analyzing data Block D UNIT D2 Regression.
Chapter 5 LSRL.
Chapter 5 LSRL.
Simple Linear Regression and Correlation
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Examining Relationship of Variables  Response (dependent) variable - measures the outcome of a study.  Explanatory (Independent) variable - explains or influences the changes in a response variable  Outlier - an observation that falls outside the overall pattern of the relationship.  Positive Association - An increase in an independent variable is associated with an increase in a dependent variable.  Negative Association - An increase in an independent variable is associated with a decrease in a dependent variable.

Scatterplot  Shows relationship between two variables (age and height in this case).  Reveals form, direction, and strength of the relationship.

Scatterplot - Demo  MS-Excel: Step 1: Select Chart wizard Step 2: From the chart type select XY(Scatter) Step 3: Select Chart from chart sub type and click next Step 4: Data range: range for variable in the x axis (age in our case), range for the variable in the y axis (hgt in our case) and click on next Step 5: Click on titles to write title and labels of the chart, click on axes and mark on Value (X) and Value (Y) axis, click on Gridlines and take all the mark off( if you want no gridlines), click on legend and put options of legend (if you don’t want legend) then click on Finish

Scatterplot - Demo  SPSS: Step 1: Click on Graphs and select Scatter/Dot Step 2: Select Simple scatter Step 3: Select variables for Yaxis (usually response variable) and Xaxis (explanatory variable) and click on Titles to write titles in Line 1. After writing title click on ‘Continue’. Step 4: Click on ‘Ok’.  R: plot(x, y, main= “Title name”, xlab=“x axis label name”, ylab=“y axis label name” ) where x and y are the data of two variables and for our example x is age and y is hgt.

Scatterplots Strong positive association Points are scattered with a poor association Strong negative association

Correlation of two variables  Correlation measures the direction and strength of the relationship between two quantitative variables. Suppose that we have data on variables x and y for n individuals. Then the correlation coefficient r between x and y is defined as, Where, s x and s y are the standard deviations of x and y.  r is always a number between -1 and 1. Values of r near 0 indicate little or no linear relationship. Values of r near -1 or 1 indicate a very strong linear relationship.  The extreme values r=1 or r=-1 occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line.

Correlation of two variables  Positive r indicates positive association i.e. association between two variables in the same direction, and negative r indicates negative association.  Scatterplot of Height and Age shows that these two variables possess a strong, positive linear relationship. The correlation coefficient of these two variables is , which is very close to 1.

Correlation of two variables r=0.02 r=-0.96r=0.98

Correlation of two variables  MS-Excel: Step 1: Click on function Wizard f x Step 2: Select function Correl Step 3: Input range Array 1 and Array 2 (for our example age and hgt) and click on Ok.  Another option:  Check if the data of two variables are in a side by side position. If not, copy and paste them side by side in a new worksheet and follow the following steps- Step 1: Select Data Analysis from the menu Tools Step 2: Select Correlation and type or select Input and Output range and click on ok.

Correlation of two variables  SPSS: Step 1: Select Correlate from the menu Analyze Step2: Click on Bivariate and select the variables (age and hgt) and click on Ok  R: cor( variable 1, variable 2) e.g. age and hgt are the variable 1 and variable 2 in our example

Example 1: Calculating Correlation Coefficient of the variables Height and Age in our data set.  MS-Excel: Click on function Wizard f x > Correl > Input range Array 1: age Input range Array 1: hgt > ok. It gives correlation coefficient, r=.98. or Tools > data analysis > correlation > range for variables age and hgt > ok. It gives you the following results. agehgt age1 hgt0.9821

Example 1: Calculating Correlation Coefficient of the variables Height and Age in our data set.  SPSS: Analyze > Correlate > Bivariate > select age and hgt > ok. It gives the following result age hgt age Pearson Correlation sig (- tailed) N Hgt Pearson Correlation sig (- tailed) N

Example 1: Calculating Correlation Coefficient of the variables Height and Age in our data set.  R: >cor(age, hgt). It gives the correlation coefficient

Strong Association but no Correlation Gas mileage of an auto mobile first increases than decreases as the speed increases like the following data: Scatter plot shows an strong association. But calculated, r = 0, why? r = 0 ? It’s because the relationship is not linear and r measures the linear relationship between two variables.

Influence of an outlier  Consider the following data set of two variables X and Y: r =  After dropping the last pair, r = 0.996

Simple Linear Regression  Regression refers to the value of a response variable as a function of the value of an explanatory variable.  A regression model is a function that describes the relationship between response and explanatory variables.  A simple linear regression has one explanatory variable and the regression line is straight.  The linear relationship of variable Y and X can be written as in the following regression model form Y= b 0 + b 1 X + e where, ‘Y’ is the response variable, ‘X’ is the explanatory variable, ‘e’ is the residual (error), and b 0 and b 1 are two parameters. Basically, b o is the intercept and b 1 is the slope of a straight line y= b 0 + b 1 X.

Simple Linear Regression A simple regression line is fitted for height on age. The intercept is and the slope (regression coefficient) is.1877.

Simple Linear Regression Assumptions: 1.Response variable is normally distributed. 2.Relationship between the two variables is linear. 3.Observations of response variable are independent. 4.Residual error is normally distributed with mean 0 and constant standard deviation.

Simple Linear Regression  Estimating Parameters b 0 and b 1 Least Square method estimates b 0 and b 1 by fitting a straight line through the data points so that it minimizes the sum of square of the deviation from each data point. Formula:

Simple Linear Regression  Fitted Least Square Regression line Fitted Line: Where is the fitted / predicted value of i th observation (Y i ) of the response variable. Estimated Residual: Least square method estimates b 0 and b 1 to minimize the summed error:

Simple Linear Regression e1e1 e3e3 e2e2 In this example, a regression line (red line) has been fitted to a series of observations (blue diamonds) and residuals are shown for a few observations (arrows).  Fitted Least Square Regression line

Simple Linear Regression  Interpretation of the Regression Coefficient and Intercept Regression coefficient (b 1 ) reflects the change in the response variable Y for a unit change in the explanatory variable X. That is, the slope of the regression line. Intercept (b 0 ) estimates the value of the response variable Y without the influence of the explanatory variable X. That is, when the explanatory variable = 0.0.

Simple Linear Regression  MS-Excel: Step1: Select Data Analysis from the menu Tools Step2: Input Y range: range for the response (dependent) variable, Input X range: range for the independent variable Step 3: Make selections of out puts you want and click on ok  SPSS: Step 1: Select Regression from the menu Analyze Step 2: Select Linear and then for Dependent: Select dependent variable and for Independent(s): select Independent variable Step 3: Click on Statistics and select options, clik on Plots and select variables for scatter plots (if you want), click on continue an d then ok  R: lm(y~x, data) summary(lm(y~x, data)) where, y is the dependent variable, x is the independent variable and data is a data frame containing the variables in the model.

Example 2: MS-Excel Linear Regression output: Height on age

Example 2: SPSS Linear Regression output: Height on age

Example 2: R Linear Regression output: Height on age >lm (hgt ~ age) gives the following output Call: lm(formula = hgt ~ age) Coefficients: (Intercept) age R code summary (lm (hgt ~ age) ) gives the following output Call: lm(formula = hgt ~ age) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** age <2e-16 *** --- Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.07 on 58 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 1659 on 1 and 58 DF, p-value: < 2.2e-16

Multiple Regression  Two or more independent variables to predict a single dependent variable.  Multiple regression model of Y on p number of explanatory variables can be written as, Y = b 0 + b 1 X 1 + b 2 X 2 +… +b p X p +e where b i (i=1,2, …, p) is the regression coefficient of X i

Multiple Regression  Fitted Y is given by,  The estimated residual error is the same as that in the simple linear regression,

Multiple Regression  MS-Excel: Every thing is the same as for the simple regression except we need to select multiple columns for the range of independent variables. If the columns of independent variables are not side by side, then we need to have them side by side. For that we can insert another excel sheet and copy the data from original sheet and paste the data in the new sheet. We first copy the dependent variable and paste that in one of the columns and then copy independent variables and paste them in side by side columns.  SPSS: Everything is the same as for the simple regression except we select more than one independent variables.  R: Every thing is the same as for the simple regression except adding more independent variables in the model. i.e. lm(y~x1 + x2, data) summary(lm(y~x1 + x2, data)) where, y is the dependent variable, x1 and x2 are independent variables and data is a data frame containing the variables in the model.

Example 3: MS Excel Multiple Regression output: Height on Age and PLUC.post

Example 3: SPSS Multiple Regression output: Height on Age and PLUC.post

Example 3: R Multiple Regression output: Height on Age and PLUC.post R code lm(hgt ~ age + PLUC.post) gives the following output Coefficients: (Intercept) age PLUC.post R code summary(lm(hgt~age+PLUC.post)) gives the following output Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) <2e-16 *** age <2e-16 *** PLUC.post Signif. codes: 0 ‘***’ ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: on 57 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: 819 on 2 and 57 DF, p-value: < 2.2e-16

Coefficient of Determination (Multiple R-squared)  Total variation in the response variable Y is due to (i) regression of all variables in the model (ii) residual (error).  Total variation of y, SS (y) = SS(Regression) +SS(Residual)  The Coefficient of Determination is,

Coefficient of Determination (Multiple R-squared)  R 2 lies between 0 and 1.  R 2 = 0.8 implies that 80% of the total variation in the response variable Y is due to the contribution of all explanatory variables in the model. That is, the fitted regression model explains 80% of the variance in the response variable.

Example 4: Calculating coefficient of determination for the variables in example 3.  MS-Excel: It’s in the MS-Excel output of Example 3.  SPSS: It’s in the SPSS output of Example 3.  R: It’s in the R output of Example 3.