Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.

Slides:



Advertisements
Similar presentations
Hypothesis Testing Developing Null and Alternative Hypotheses Developing Null and Alternative Hypotheses Type I and Type II Errors Type I and Type II Errors.
Advertisements

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
© 2010 Pearson Prentice Hall. All rights reserved Two Sample Hypothesis Testing for Means from Independent Groups.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
4.1 All rights reserved by Dr.Bill Wan Sing Hung - HKBU Lecture #4 Studenmund (2006): Chapter 5 Review of hypothesis testing Confidence Interval and estimation.
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Inference about a Mean Part II
Inferences About Process Quality
SIMPLE LINEAR REGRESSION
Chapter 9 Hypothesis Testing.
5-3 Inference on the Means of Two Populations, Variances Unknown
SIMPLE LINEAR REGRESSION
Copyright © Cengage Learning. All rights reserved. 11 Applications of Chi-Square.
AM Recitation 2/10/11.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.
Chapter 13: Inference in Regression
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Chapter Inference on the Least-Squares Regression Model and Multiple Regression 14.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Chapter 13: Linear Correlation and Regression Analysis
Copyright © Cengage Learning. All rights reserved. 9 Inferences Involving One Population.
14 Elements of Nonparametric Statistics
Copyright © Cengage Learning. All rights reserved. 8 Introduction to Statistical Inferences.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Involving One Population.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Statistical Inferences Based on Two Samples Chapter 9.
Copyright © 2012 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 17 Inferential Statistics.
Copyright © 2008 Wolters Kluwer Health | Lippincott Williams & Wilkins Chapter 22 Using Inferential Statistics to Test Hypotheses.
Chapter 9 Hypothesis Testing and Estimation for Two Population Parameters.
14 Elements of Nonparametric Statistics
Chapter 9: Testing Hypotheses
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Copyright © 2013, 2010 and 2007 Pearson Education, Inc. Section Inference about Two Means: Independent Samples 11.3.
Copyright © Cengage Learning. All rights reserved. 14 Elements of Nonparametric Statistics.
Chapter 7 Inferences Based on a Single Sample: Tests of Hypotheses.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Interval Estimation and Hypothesis Testing Prepared by Vera Tabakova, East Carolina University.
Chapter 8 Parameter Estimates and Hypothesis Testing.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
Inferences from sample data Confidence Intervals Hypothesis Testing Regression Model.
Ex St 801 Statistical Methods Inference about a Single Population Mean.
Chapter 9: Testing Hypotheses Overview Research and null hypotheses One and two-tailed tests Type I and II Errors Testing the difference between two means.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
© Copyright McGraw-Hill 2004
Formulating the Hypothesis null hypothesis 4 The null hypothesis is a statement about the population value that will be tested. null hypothesis 4 The null.
Inferences Concerning Variances
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Copyright © Cengage Learning. All rights reserved. 9 Inferences Based on Two Samples.
Chapter Eleven Performing the One-Sample t-Test and Testing Correlation.
Copyright © Cengage Learning. All rights reserved. 12 Analysis of Variance.
Lesson Testing the Significance of the Least Squares Regression Model.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
© 2010 Pearson Prentice Hall. All rights reserved Chapter Hypothesis Tests Regarding a Parameter 10.
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Copyright © Cengage Learning. All rights reserved. 10 Inferences about Differences.
4-1 Statistical Inference Statistical inference is to make decisions or draw conclusions about a population using the information contained in a sample.
Inference about the slope parameter and correlation
Regression and Correlation
Chapter 14 Inference on the Least-Squares Regression Model and Multiple Regression.
Math 4030 – 10b Inferences Concerning Variances: Hypothesis Testing
Estimation & Hypothesis Testing for Two Population Parameters
Elementary Statistics
SIMPLE LINEAR REGRESSION
SIMPLE LINEAR REGRESSION
Presentation transcript:

Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis

Copyright © Cengage Learning. All rights reserved Inferences Concerning the Slope of the Regression Line

3 Inferences Concerning the Slope of the Regression Line If random samples of size n are repeatedly taken from a bivariate population, then the calculated slopes, the b 1 ’s, will form a sampling distribution that is normally distributed with a mean of  1, the population value of the slope, and with a variance of, where provided there is no lack of fit. An appropriate estimator for is obtained by replacing by, the estimate of the variance of the error about the regression line: (13.10) (13.11)

4 Inferences Concerning the Slope of the Regression Line This formula may be rewritten in the following, more manageable form:

5 Inferences Concerning the Slope of the Regression Line Note: The “standard error of ___ ” is the standard deviation of the sampling distribution of ___. Therefore, the standard error of regression (slope) is and is estimated by.

6 Inferences Concerning the Slope of the Regression Line Assumptions for inferences about the linear regression: The set of (x, y) ordered pairs forms a random sample, and the y values at each x have a normal distribution. Since the population standard deviation is unknown and replaced with the sample standard deviation, the t-distribution will be used with n – 2 degrees of freedom.

7 Confidence Interval Procedure

8 The slope,  1, of the regression line of the population can be estimated by means of a confidence interval.

9 Example 7 – Constructing a Confidence Interval for  1, The Population Slope Of the Line of Best Fit Suppose you move to a new city and find a job. You will, of course, be concerned about the problems you will face commuting to and from work. For example, you would like to know how long it will take you to drive to work each morning. Let’s use “one-way distance to work” as a measure of where you live. You live x miles away from work and want to know how long it will take you to commute each day. Your new employer, foreseeing this question, has already collected a random sample of data to be used in answering your question. Fifteen of your new co-workers were asked to give their one-way travel times and distances to work.

10 Example 7 – Constructing a Confidence Interval for  1, The Population Slope Of the Line of Best Fit The resulting data are shown in Table (For convenience, the data have been arranged so that the x values are in numerical order.) Find the line of best fit and the variance of y about the line of best fit,. Find the 95% confidence interval for the population’s slope,  1. Data on Commute Distances and Times [TA13-2] Table 13.2 cont’d

11 Example 7 – Solution Step 1 a. Parameter of interest: The slope  1, of the line of best fit for the population Step 2 a. Assumptions: The ordered pairs form a random sample, and we will assume that the y values (minutes) at each x (miles) have a normal distribution. b. Probability distribution and formula: Student’s t–distribution and formula (13.14).

12 Example 7 – Solution c. Level of confidence: 1 –  = 0.95 Step 3 Sample information: n = 15, b 1 = 1.89, = Step 4 a. Confidence coefficients: From Table 6 in Appendix B, we find t (df,  /2) = t (13, 0.025) = cont’d

13 Example 7 – Solution b. Maximum error of estimate: We use formula (13.14) to find E = t (n – 2,  /2)  : E = (2.16)  = c. Lower and upper confidence limits: b 1 – E to b 1 + E 1.89 – 0.62 to Thus, 1.27 to 2.51 is the 95% confidence interval for  1. cont’d

14 Example 7 – Solution Step 5 Confidence interval: We can say that the slope of the line of best fit of the population from which the sample was drawn is between 1.27 and 2.51 with 95% confidence. That is, we are 95% confident that, on average, every extra mile will take between 1.27 minutes (1 min, 16 sec) and 2.5 minutes (2 min, 31 sec) of time to make the commute. cont’d

15 Hypothesis-Testing Procedure

16 Hypothesis-Testing Procedure We are now ready to test the hypothesis  1 = 0. That is, we want to determine whether the equation of the line of best fit is of any real value in predicting y. For this hypothesis test, the null hypothesis is always H o :  1 = 0. It will be tested using Student’s t-distribution with df = n – 2 and the test statistic t found using formula (13.15):

17 Example 9 – One-tailed Hypothesis Test for the Slope of the Regression Line Suppose you move to a new city and find a job. You will, of course, be concerned about the problems you will face commuting to and from work. For example, you would like to know how long it will take you to drive to work each morning. Let’s use “one-way distance to work” as a measure of where you live. You live x miles away from work and want to know how long it will take you to commute each day. Your new employer, foreseeing this question, has already collected a random sample of data to be used in answering your question. Fifteen of your new co-workers were asked to give their one-way travel times and distances to work.

18 Example 9 – One-tailed Hypothesis Test for the Slope of the Regression Line The resulting data are shown in Table (For convenience, the data have been arranged so that the x values are in numerical order.) Find the line of best fit and the variance of y about the line of best fit,. Data on Commute Distances and Times [TA13-2] Table 13.2 cont’d

19 Example 9 – One-tailed Hypothesis Test for the Slope of the Regression Line Is the slope of the line of best fit significant enough to show that one-way distance is useful in predicting one-way travel time? Use  = Solution: Step 1 a. Parameter of interest:  1, the slope of the line of best fit for the population b. Statement of hypotheses: H a :  1 = 0 (This implies that x is of no use in predicting y; that is, would be as effective.) The alternative hypothesis can be either one-tailed or two-tailed. If we suspect that the slope is positive, a one- tailed test is appropriate. cont’d

20 Example 9 – Solution Ha:  1 > 0. (We expect travel time y to increase as the distance x increases.) Step 2 a. Assumptions: The ordered pairs form a random sample, and we will assume that the y values (minutes) at each x (miles) have a normal distribution. b. Probability distribution and test statistic: The t-distribution with df = n – 2 = 13, and the test statistic t from formula (13.15) c. Level of significance:  = 0.05 cont’d

21 Example 9 – Solution Step 3 a. Sample information: n = 15, b 1 = 1.89, and = b. Test statistic: Using formula (13.15), we find the observed value of t: cont’d

22 Example 9 – Solution Step 4 Probability Distribution: p-Value: a. Use the right-hand tail because H a expresses concern for values related to “positive.” P = P (t > 6.63, with df = 13) as shown in the figure. cont’d

23 Example 9 – Solution To find the p-value, use one of three methods: 1. Use Table 6 (Appendix B) to place bounds on the p-value: P < Use Table 7 (Appendix B) to place bounds on the p-value: P < Use a computer or calculator to find the p-value: P = b. The p-value is smaller than the level of significance, . cont’d

24 Example 9 – Solution Classical: a. The critical region is the right-hand tail because H a expresses concern for values related to “positive.” The critical value is found in Table 6: b. t is in the critical region, as shown in red in the figure. cont’d

25 Example 9 – Solution Step 5 a. Decision: Reject H o. b. Conclusion: At the 0.05 level of significance, we conclude that the slope of the line of best fit in the population is greater than zero. The evidence indicates that there is a linear relationship and that the one-way distance (x) is useful in predicting the travel time to work (y). cont’d