Presentation is loading. Please wait.

Presentation is loading. Please wait.

Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18.

Similar presentations


Presentation on theme: "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18."— Presentation transcript:

1 Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18

2 Copyright (c) Bani K. Mallick2 Topics in Lecture #18 Regression and Scatterplots

3 Copyright (c) Bani K. Mallick3 Book Chapters in Lecture #18 Chapters 11.1, 11.2

4 Copyright (c) Bani K. Mallick4 Linear Regression and Correlation Linear regression and correlation are aimed at understanding how two variables are related The variables are called Y and X Y is called the dependent variable X is called the independent variable We want to know how, and whether, X influences Y

5 Copyright (c) Bani K. Mallick5 Linear Regression and Correlation The basic tool of regression is the scatterplot This simply plots the data in a graph X is along the horizontal (or X) axis Y is along the vertical (or Y) axis

6 Copyright (c) Bani K. Mallick6 GPA and Height Note how GPA’s generally get lower as height increases: data do not fall exactly on a line

7 Copyright (c) Bani K. Mallick7 Log(1+Aortic Valve Area) and Body Surface Area Note how AVA’s generally get larger as Body Surface Area increases: data do not fall exactly on a line

8 Copyright (c) Bani K. Mallick8 Exam Grades in Stat651

9 Copyright (c) Bani K. Mallick9 Linear Regression and Correlation Let Y = GPA, X = height A linear prediction equation is a line, such as The intercept of the line = The slope of the line =

10 Copyright (c) Bani K. Mallick10 Linear Regression and Correlation = place where line crosses Y axis when X=0 12345 0

11 Copyright (c) Bani K. Mallick11 Linear Regression and Correlation = how much the line increases when X is increased by 1 unit 12345 0

12 Copyright (c) Bani K. Mallick12 Linear Regression and Correlation Every one of us will draw a slightly different line through the data We need an algorithm to construct a line that in some sense “best fits the data” The usual method, called least squares, tries to make the squared distance between the line and the actual data as small as possible

13 Copyright (c) Bani K. Mallick13 GPA and Height Try your hand at computing a line through these data: Draw on the paper by eye, and compare it to my line drawn by eye on the next slide

14 Copyright (c) Bani K. Mallick14 GPA and Height Try your hand at computing a line through these data: Draw on the paper by eye, and compare it to my line drawn by eye on the next slide

15 Copyright (c) Bani K. Mallick15 Linear Regression and Correlation Every one of us will draw a slightly different line through the data We need an algorithm to construct a line that in some sense “best fits the data” The usual method, called least squares, tries to make the squared distance between the line and the actual data as small as possible

16 Copyright (c) Bani K. Mallick16 Linear Regression and Correlation The usual method, called least squares, tries to make the squared distance between the line and the actual data as small as possible The data are Any line is Total squared distance is The slope & intercept are chosen to minimize this total squared distance

17 Copyright (c) Bani K. Mallick17 GPA and Height Distance of observation (in red) to the line (in blue)

18 Copyright (c) Bani K. Mallick18 Linear Regression and Correlation Total squared distance is The slope & intercept are chosen to minimize this total squared distance The slope is

19 Copyright (c) Bani K. Mallick19 Linear Regression and Correlation Total squared distance is The slope & intercept are chosen to minimize this total squared distance The intercept is This is algebra! The estimates are called the least squares estimates SPSS calculates these automatically

20 Copyright (c) Bani K. Mallick20 Linear Regression and Correlation Intercept & Slope are in “B” column” Constant = intercept

21 Copyright (c) Bani K. Mallick21 Linear Regression and Correlation Intercept = 5.529 & Slope = -0.0372 are in “B” column

22 Copyright (c) Bani K. Mallick22 Linear Regression and Correlation Intercept = 5.529 & Slope = -0.0372 are in “B” column This means the least squares line is GPA = 5.529 – 0.0372 * Height Interpretation #1: The slope of the line is negative, indicating a possible negative relationship between height and GPA

23 Copyright (c) Bani K. Mallick23 Linear Regression and Correlation Intercept = 5.529 & Slope = -0.0372 are in “B” column This means the least squares line is GPA = 5.529 – 0.0372 * Height Interpretation #2: The Least Squares line suggests that for every inch in height added, the GPA decreases by 0.0372

24 Copyright (c) Bani K. Mallick24 Linear Regression and Correlation Intercept = 5.529 & Slope = -0.0372 are in “B” column This means the least squares line is GPA = 5.529 – 0.0372 * Height Interpretation #3: The Least Squares line suggests that is someone is 64 inches tall, his/her GPA might be predicted by 5.529 – 0.0372 * 64 = 3.148

25 Copyright (c) Bani K. Mallick25 Linear Regression and Correlation Intercept = 5.529 & Slope = -0.0372 are in “B” column This means the least squares line is GPA = 5.529 – 0.0372 * Height Interpretation #4: There is something fishy here. Why should height predict GPA? Makes no sense to me.

26 Copyright (c) Bani K. Mallick26 Linear Regression and Correlation SPSS can draw a scatterplot and put the least squares line into it Use “graphs”, “interactive”, “scatterplot” Fix “title” Under “fit”, click on “total” Can remove the label for the line & Rsquare SPSS Demo on the exam scores for 2001: need to change status of variable from categorical

27 Copyright (c) Bani K. Mallick27 Linear Regression and Correlation Depending on how you input the data, SPSS may insist that your numerical variable is actually categorical When you do the interactive plot, right click then click on “scale” to convert it

28 Copyright (c) Bani K. Mallick28 Linear Regression and Correlation You can manipulate the graph by double clicking on it and then moving things around Note how the least squares line is part of the graph

29 Copyright (c) Bani K. Mallick29 Linear Regression and Correlation You can manipulate the graph by double clicking on it and then moving things around Note how the least squares line is part of the graph

30 Copyright (c) Bani K. Mallick30 Linear Regression and Correlation The population parameters and are simply the least squares estimates computed on all the members of the population, not just the sample Population parameters: Sample statistics:

31 Copyright (c) Bani K. Mallick31 Linear Regression and Correlation Formally speaking, the linear regression model says that Y and X are related: Here  is the error (or deviation from the line) Also, and are the population intercept and slope

32 Copyright (c) Bani K. Mallick32 Linear Regression and Correlation Formally speaking, the linear regression model says that Y and X are related: The meaning of the line is: Take the (sub)population all of whom have independent variable value X The mean of this (sub)population is

33 Copyright (c) Bani K. Mallick33 Linear Regression and Correlation Formally speaking, the linear regression model says that Y and X are related: In order to make inference about the population slope and intercept, we need to make a few assumptions

34 Copyright (c) Bani K. Mallick34 Linear Regression and Correlation Assumption #1: A straight line really fits the data (sort of by inspection) There is no point fitting a straight line to this type of data: * * * * * * * *

35 Copyright (c) Bani K. Mallick35 Linear Regression and Correlation Assumption #2: The errors  are at least vaguely normally distributed This is important for inference, especially the normal ranges we will construct later

36 Copyright (c) Bani K. Mallick36 Linear Regression and Correlation Assumption #3: The errors  have somewhat the same variances This is important for inference, especially the normal ranges we will construct later It is also important for making inferences about the population slope

37 Copyright (c) Bani K. Mallick37 Linear Regression and Correlation Assumption #1: A straight line really fits the data Assumption #2: The errors  are at least vaguely normally distributed Assumption #3: The errors  have somewhat the same variances We will build graphical ways to check these assumptions


Download ppt "Copyright (c) Bani K. Mallick1 STAT 651 Lecture #18."

Similar presentations


Ads by Google