Presentation is loading. Please wait.

Presentation is loading. Please wait.

19 - 1 Module 19: Simple Linear Regression This module focuses on simple linear regression and thus begins the process of exploring one of the more used.

Similar presentations


Presentation on theme: "19 - 1 Module 19: Simple Linear Regression This module focuses on simple linear regression and thus begins the process of exploring one of the more used."— Presentation transcript:

1 19 - 1 Module 19: Simple Linear Regression This module focuses on simple linear regression and thus begins the process of exploring one of the more used and powerful statistical tools. Reviewed 11 May 05 /MODULE 19

2 19 - 2 An ophthalmologist who is assessing intraocular pressures as a part of a community program for the prevention of glaucoma is interested in using a portable device (Tono- Pen) for making these measurements. An important question is how well the measurements made with this device compare to those made with a more standard device (Goldman) used in clinical settings. To address this question, the ophthalmologist compared the two devices by using each on n = 40 eyes. For this comparison, each eye was measured once with each device. Goldman-Tono-Pen Example

3 19 - 3 Goldman-Tono-Pen Example Data

4 19 - 4 One approach to comparing the two devices would be to do a paired t-test, which would be appropriate since the measurements made by the two devices on the same eyes could not be considered independent and since the differences between the two measurements are of interest. Comparing the two Devices

5 19 - 5 Goldman-Tono-Pen Worksheet

6 19 - 6 GoldmanTono-Pen IDX=GY=Td=G-Td2d2 N40 Sum79976633451 Mean19.97519.150.825 SD3.71064.1853.296 Sum 2 /n15,960.0314,668.9027.23 Sum(x 2 )16,49715,352451 SS536.98683.10423.78 s2s2 13.7717.5210.87 SE0.5870.6620.521 t = mean(d)/SE(d)1.58 df = n-139 t 0.975 (39)2.02

7 19 - 7 1.Hypothesis: H 0 :  =  G -  T = 0 vs. H 1 :  ≠ 0, 2. Assumptions: Differences are a random sample with normal distribution, 3. The  level:  = 0.05, 4.Test statistic: 5. The Rejection Region: Reject if t is not between ± t 0.975 (39)= 2.02 6. The Result: 7. The conclusion: Accept H 0 :  =  G -  T = 0, since t is between ± 2.02.

8 19 - 8 Hence, from this standpoint, we do not have compelling evidence that the two devices are measuring intra-ocular pressures differently. Is this a sufficient assessment of the situation, or should we look further?

9 19 - 9 One way to look further at this situation is to think about the relationship between the measurements made by the two machines in terms of simple linear regression. In this context, we would wonder if higher values on one machine more directly imply higher values on the other. Simple linear regression focuses on a possible straight line relationship between the measurements made by the two machines. Looking Further

10 19 - 10 In general, simple linear regression finds the best straight line for describing the relationship between two variables. In its simplest form, which is what we consider here, it does not do a very good job of assessing how well the line describes the data, but nevertheless provides useful information. Simple Linear Regression Concepts

11 19 - 11 a = Intercept, that is, the point where the line crosses the y-axis, which is the value of y at x = 0. b = Slope of the regression line, that is, the number of units of increase (positive slope) or decrease (negative slope) in y for each unit increase in x. 0 x-axis Independent Variable y-axis Dependent variable a

12 19 - 12 The Regression Line

13 19 - 13

14 19 - 14

15 19 - 15 The context for simple linear regression is that we have a random sample of persons from a set of well-defined populations, each defined by a specific value for x- variable. We have measurements of another variable, the y-variable so that we have two variables for each person. For simple linear regression, we focus on a straight line that depicts the relationship between these two variables. The best straight line is the one for which the sum of the squared vertical distances of each point from the line is the least. This "least squares" line has slope and intercept

16 19 - 16 For this situation, the sample line is an estimate of the population line and a and b are estimates of α and  respectively. For a specific value of x, such as x = 10, the value for y calculated from the regression equation is which is called the regression estimate of Y at the value x = 10.

17 19 - 17 Simple Regression Example The following data are diastolic blood pressure (DBP) measurements taken at different times after an intervention for n = 5 persons. For each person, the data available include the time of the measurement and the DBP level. Of interest is the relationship between these two variables.

18 19 - 18 Time DPB Patient x x 2 y y 2 xy 100725,1840 2525664,356330 310100704,900700 415225644,096960 520400664,3561,320 Sum5075033822,8923,310 Mean1067.6 n55

19 19 - 19

20 19 - 20

21 19 - 21 Example: AJPH, Dec. 2003; 93: 2099-2104

22 19 - 22

23 19 - 23 Never Smoking Regression Worksheet

24 19 - 24 For the never smoking data The slopes are

25 19 - 25 The intercepts are The best lines are:

26 19 - 26 y male = -1674.02 +0.871x y female = -501.29 +0.285x

27 19 - 27 Regression ANOVA If the regression line is flat in the sense that the regression estimate of Y, being ŷ, is the same for all values of x, then there is no gain from considering the x variable as it is having no impact on ŷ. This situation occurs when the estimated slope b = 0. An important question is whether or not the population parameter  = 0, that is, whether the truth is that there is no linear relationship between y and x. To test this situation, we can proceed with a formal test.

28 19 - 28 1. The Hypothesis: H 0 :  = 0 vs H 1 :  ≠ 0 2. The  level:  = 0.05 3. The assumptions: Random normal samples for y- variable from populations defined by x-variable 4. The test statistic: 5. The rejection region : Reject H 0 :  = 0 if the value calculated for F is greater than F 0.95 (1, n-2)

29 19 - 29 R 2 is the total amount of variation in the dependent variable y explained by its regression relationship with x.

30 19 - 30 Blood Pressure Example

31 19 - 31

32 19 - 32 We can apply these tools to the Goldman-Tono-Pen example. Note that while we test the null hypothesis H 0 :  = 0, it is of little interest as it is not a very meaningful hypothesis. Goldman-Tono-Pen Example

33 19 - 33

34 19 - 34 Create a new table

35 19 - 35

36 19 - 36 1. The Hypothesis: H 0 :  = 0 vs H 1 :   0 2. The Assumptions: Random samples, x measured without error, y normal distributed for each level of x 3. The  -level:  = 0.05 4. The test statistic: ANOVA 5. The rejection region: Reject H 0 :  = 0, if Regression ANOVA – Goldman Tono-Pen Example

37 19 - 37

38 19 - 38 Example: AJPH, Aug. 1999; 89: 1187-1193

39 19 - 39

40 19 - 40

41 19 - 41 y = 3.92 + 0.24x At x = 45, y = 14.72 r = 0.70

42 19 - 42 Regression ANOVA Social Capital and Self-Rated Health Example 1. The Hypothesis: H 0 :  = 0 vs H 1 :   0 2. The Assumptions: Random samples, x measured without error, y normal distributed for each level of x 3. The  -level:  = 0.05 4. The test statistic: ANOVA 5. The rejection region: Reject H 0 :  = 0, if

43 19 - 43

44 19 - 44 Example: AJPH, July 1999; 89: 1059 -1065

45 19 - 45

46 19 - 46

47 19 - 47 Socioeconomic Environment and Adult Health Example

48 19 - 48 Men SS(x) = 179.20 SS(y) = 381.54 SS(xy) = 247.01 b = 1.38 a = -1.57 r = 0.9447 SS(Reg) = 340.50 SS(Res) = 41.04 SS(Total) = 381.54 Women SS(x) = 177.95 SS(y) = 500.05 SS(xy) = 277.68 b = 1.56 a = -2.25 r = 0.9309 SS(Reg) = 433.30 SS(Res) = 66.75 SS(Total) = 500.05 Socioeconomic Environment and Adult Health Example

49 19 - 49 Men Women 1. The hypothesis: H 0 :  = 0 vs H 1 :   0 H 0 :  = 0 vs H 1 :   0 2. The assumptions: Random samples The same as that of men x measured without error y normal distributed for each level of x 3. The  -level :  = 0.05  = 0.05 4. The test statistic: ANOVA ANOVA 5. The rejection region: Reject H 0 :  = 0, if The same as that of men Socioeconomic Environment and Adult Health Example

50 19 - 50 6. The result: ANOVA Men Women Source df SS MS F Regression 1 340.50 340.50 91.29 1 433.30 433.30 70.38 Residual 11 41.04 3.7311 66.75 6.07 Total 12 381.5412 500.05 7. The conclusion: Reject H 0 :  = 0 since F > F 0.95(1,11) = 4.08 Regression ANOVA Socioeconomic Environment and Adult Health Example Create a new table


Download ppt "19 - 1 Module 19: Simple Linear Regression This module focuses on simple linear regression and thus begins the process of exploring one of the more used."

Similar presentations


Ads by Google