Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

Slides:

Advertisements

Similar presentations

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.

Advertisements

Copyright © 2010 Pearson Education, Inc. Slide

Inference for Regression

6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.

Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester

CHAPTER 24: Inference for Regression

Objectives (BPS chapter 24)

Chapter 12 Simple Regression

The Simple Regression Model

Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.

Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.

Linear Regression and Correlation Analysis

Simple Linear Regression Analysis

SIMPLE LINEAR REGRESSION

Chapter 14 Introduction to Linear Regression and Correlation Analysis

Correlation and Regression Analysis

Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.

Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.

Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.

Linear Regression/Correlation

Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.

Correlation & Regression

SIMPLE LINEAR REGRESSION

Introduction to Linear Regression and Correlation Analysis

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.

STA291 Statistical Methods Lecture 27. Inference for Regression.

Regression Analysis (2)

Inferences for Regression

BPS - 3rd Ed. Chapter 211 Inference for Regression.

Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.

Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.

© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.

+ Chapter 12: Inference for Regression Inference for Linear Regression.

Introduction to Linear Regression

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.

+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.

1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.

© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.

Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.

Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.

Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

Agresti/Franklin Statistics, 1 of 88 Chapter 11 Analyzing Association Between Quantitative Variables: Regression Analysis Learn…. To use regression analysis.

Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.

Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.

CHAPTER 3 Describing Relationships

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.

Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section.

1 Chapter 12: Analyzing Association Between Quantitative Variables: Regression Analysis Section 12.1: How Can We Model How Two Variables Are Related?

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.1 Using Several Variables to Predict a Response.

Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.

BPS - 5th Ed. Chapter 231 Inference for Regression.

The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 12 More About Regression 12.1 Inference for.

Regression and Correlation

Correlation and Simple Linear Regression

Inference for Regression

CHAPTER 12 More About Regression

Lecture Slides Elementary Statistics Thirteenth Edition

CHAPTER 26: Inference for Regression

CHAPTER 12 More About Regression

CHAPTER 12 More About Regression

Inferences for Regression

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression

Presentation transcript:

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12: Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.1 Model How Two Variables Are Related

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 3 Regression Analysis The first step of a regression analysis is to identify the response and explanatory variables.  We use y to denote the response variable.  We use x to denote the explanatory variable.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 4 The Scatterplot The first step in answering the question of association is to look at the data. A scatterplot is a graphical display of the relationship between the response variable (y) and the explanatory variable (x).

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 5 Example: The Strength Study An experiment was designed to measure the strength of female athletes. The goal of the experiment was to determine if there is an association between the maximum number of pounds that each individual athlete could bench press and the number of 60-pound bench presses that athlete could do.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc high school female athletes participated in the study. The data consisted of the following variables:  x: the number of 60-pound bench presses an athlete could do.  y: maximum bench press. Example: The Strength Study

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 7 For the 57 females in this study, these variables are summarized by:  x: mean = 11.0, st. deviation = 7.1  y: mean = 79.9 lbs, st. dev. = 13.3 lbs Example: The Strength Study

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 8 Figure 12.1 Scatterplot for y=Maximum Bench Press and x=Number of 60-lb. Bench Presses. Example: The Strength Study

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 9 The Regression Line Equation When the scatterplot shows a linear trend, a straight line can be fitted through the data points to describe that trend. The regression line is: is the predicted value of the response variable y, is the y-intercept and is the slope.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 10 Example: Regression Line Predicting Maximum Bench Press Table 12.1 MINITAB Printout for Regression Analysis of y=Maximum Bench Press (BP) and x =Number of 60-Pound Bench Presses (BP_60). TI-83+/84 output

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 11 The MINITAB output shows the following regression equation: BP = (BP_60) The y-intercept is 63.5 and the slope is The slope of 1.49 tells us that predicted maximum bench press increases by about 1.5 pounds for every additional 60-pound bench press an athlete can do. Example: Regression Line Predicting Maximum Bench Press

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 12 Outliers Check for outliers by plotting the data. The regression line can be pulled toward an outlier and away from the general trend of points.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 13 Influential Points An observation can be influential in affecting the regression line when one or more of two things happen:  Its x value is low or high compared to the rest of the data.  It does not fall in the straight-line pattern that the rest of the data have

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 14 Residuals are Prediction Errors The regression equation is often called a prediction equation. The difference between an observed outcome and its predicted value is the prediction error, called a residual.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 15 SUMMARY: Review of Residuals Each observation has a residual. A residual is the vertical distance between the data point and the regression line. The smaller the distance, the better the prediction.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 16 We can summarize how near the regression line the data points fall by The regression line has the smallest sum of squared residuals and is called the least squares line. SUMMARY: Review of Residuals

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 17 Regression Model: A Line Describes How the Mean of y Depends on x At a given value of x, the equation:  Predicts a single value of the response variable.  But… we should not expect all subjects at that value of x to have the same value of y because variability occurs in the y values.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 18 SUMMARY: The Regression Line The regression line connects the estimated means of y at the various x values. In summary,  Describes the relationship between x and the estimated means of y at the various values of x.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 19 The population regression equation describes the relationship in the population between x and the means of y. The equation is denoted by: The Population Regression Equation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 20 In the population regression equation, is a population y-intercept and is a population slope.  These are parameters, so in practice their values are unknown. In practice we estimate the population regression equation using the prediction equation for the sample data. The Population Regression Equation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 21 The population regression equation merely approximates the actual relationship between x and the population means of y. It is a model.  A model is a simple approximation for how variables relate in the population. The Population Regression Equation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 22 The Regression Model Figure 12.2 The Regression Model for the Means of y Is a Simple Approximation for the True Relationship. Question: Can you sketch a true relationship for which this model is a very poor approximation?

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 23 If the true relationship is far from a straight line, this regression model may be a poor one. Figure 12.3 The Straight-Line Regression Model Provides a Poor Approximation When the Actual Relationship Is Highly Nonlinear. Question: What type of mathematical function might you consider using for a regression model in this case? The Regression Model

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 24 Variability about the Line At each fixed value of x, variability occurs in the y values around their mean,. The probability distribution of y values at a fixed value of x is a conditional distribution. At each value of x, there is a conditional distribution of y values. An additional parameter describes the standard deviation of each conditional distribution.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 25 A Statistical Model A statistical model never holds exactly in practice. It is merely an approximation for reality. Even though it does not describe reality exactly, a model is useful if the true relationship is close to what the model predicts.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.2 Describe Strength of Association

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 27 SUMMARY: Properties of the Correlation, r The correlation, denoted by r, describes linear association. The correlation ‘r’ has the same sign as the slope ‘b’. The correlation ‘r’ always falls between -1 and +1. The larger the absolute value of r, the stronger the linear association.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 28 Correlation and Slope We can’t use the slope to describe the strength of the association between two variables because the slope’s numerical value depends on the units of measurement. The correlation is a standardized version of the slope. The correlation does not depend on units of measurement.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 29 The correlation and the slope are related in the following way: Correlation and Slope

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 30 Example: Predicting Strength For the female athlete strength study:  x: number of 60-pound bench presses  y: maximum bench press  x: mean = 11.0, st.dev.=7.1  y: mean= 79.9 lbs., st.dev. = 13.3 lbs. Regression equation:

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 31 The variables have a strong, positive association. Example: Predicting Strength

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 32 Another way to describe the strength of association refers to how close predictions for y tend to be to observed y values. The variables are strongly associated if you can predict y much better by substituting x values into the prediction equation than by merely using the sample mean and ignoring x. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 33 Consider the prediction error: the difference between the observed and predicted values of y.  Using the regression line to make a prediction, each error is:.  Using only the sample mean,, to make a prediction, each error is:. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 34 When we predict y using (that is, ignoring x), the error summary equals: This is called the total sum of squares. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 35 When we predict y using x with the regression equation, the error summary is: This is called the residual sum of squares. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 36 When a strong linear association exists, the regression equation predictions tend to be much better than the predictions using. We measure the proportional reduction in error and call it,. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 37 We use the notation for this measure because it equals the square of the correlation. The Squared Correlation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 38 Example: Strength Study For the female athlete strength study:  x: number of 60-pound bench presses  y: maximum bench press  The correlation value was found to be We can calculate from For predicting maximum bench press, the regression equation has 64% less error than has.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 39 The Squared Correlation SUMMARY: Properties of :  falls between 0 and 1  when, this happens only when all the data points fall exactly on the regression line.  when, this happens when the slope, in which case each.  The closer is to 1, the stronger the linear association: the more effective the regression equation is compared to in predicting y.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 40 SUMMARY: Correlation r and Its Square Both r and describe the strength of association. ‘r’ falls between -1 and +1.  It represents the slope of the regression line when x and y have equal standard deviations. ‘ ’ falls between 0 and 1.  It summarizes the reduction in sum of squared errors in predicting y using the regression line instead of using.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.3 Make Inferences About the Association

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 42 Descriptive and Inferential Parts of Regression The sample regression equation, r, and are descriptive parts of a regression analysis. The inferential parts of regression use the tools of confidence intervals and significance tests to provide inference about the regression equation, the correlation and r-squared in the population of interest.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 43 SUMMARY: Basic assumption for using regression line for description:  The population means of y at different values of x have a straight-line relationship with x, that is:  This assumption states that a straight-line regression model is valid.  This can be verified with a scatterplot. Assumptions for Regression Analysis

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 44 SUMMARY: Extra assumptions for using regression to make statistical inference:  The data were gathered using randomization.  The population values of y at each value of x follow a normal distribution, with the same standard deviation at each x value. Assumptions for Regression Analysis

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 45 Models, such as the regression model, merely approximate the true relationship between the variables. A relationship will not be exactly linear, with exactly normal distributions for y at each x and with exactly the same standard deviation of y values at each x value. Assumptions for Regression Analysis

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 46 Testing Independence between Quantitative Variables Suppose that the slope of the regression line equals 0 Then…  The mean of y is identical at each x value.  The two variables, x and y, are statistically independent:  The outcome for y does not depend on the value of x.  It does not help us to know the value of x if we want to predict the value of y.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 47 Figure 12.8 Quantitative Variables x and y Are Statistically Independent When the True Slope = 0. Each normal curve shown here represents the variability in y values at a particular value of x. When = 0, the normal distribution of y is the same at each value of x. Question: How can you express the null hypothesis of independence between x and y in terms of a parameter from the regression model? Testing Independence between Quantitative Variables

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 48 SUMMARY: Steps of Two-Sided Significance Test about a Population Slope : 1. Assumptions:  The population satisfies regression line:  Data obtained using randomization  The population values of y at each value of x follow a normal distribution, with the same standard deviation at each x value. Testing Independence between Quantitative Variables

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 49 SUMMARY: Steps of Two-Sided Significance Test about a Population Slope : 2. Hypotheses: 3. Test statistic:  Software supplies sample slope and its Testing Independence between Quantitative Variables

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 50 SUMMARY: Steps of Two-Sided Significance Test about a Population Slope : 4. P-value: Two-tail probability of t test statistic value more extreme than observed: Use t distribution with 5. Conclusions: Interpret P-value in context. If decision needed, reject if P-value significance level. Testing Independence between Quantitative Variables

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 51 Example: 60-Pound Strength and Bench Presses Table 12.4 MINITAB Printout for Regression Analysis of y=Maximum Bench Press (BP) and x=Number of 60-Pound Bench Presses (BP_60).

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 52 Conduct a two-sided significance test of the null hypothesis of independence. 1. Assumptions:  A scatterplot of the data revealed a linear trend so the straight-line regression model seems appropriate.  The scatter of points have a similar spread at different x values.  The sample was a convenience sample, not a random sample, so this is a concern. Example: 60-Pound Strength and Bench Presses

Copyright © 2013, 2009, and 2007, Pearson Education, Inc Hypotheses: 3. Test statistic: 4. P-value: Conclusion: An association exists between the number of 60-pound bench presses and maximum bench press. Example: 60-Pound Strength and Bench Presses

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 54 A Confidence Interval for A small P-value in the significance test of suggests that the population regression line has a nonzero slope. To learn how far the slope falls from 0, we construct a confidence interval:

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 55 Example: Estimating the Slope for Predicting Maximum Bench Press Construct a 95% confidence interval for. Based on a 95% CI, we can conclude, on average, the maximum bench press increases by between 1.2 and 1.8 pounds for each additional 60-pound bench press that an athlete can do.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 56 Let’s estimate the effect of a 10-unit increase in x:  Since the 95% CI for is (1.2, 1.8), the 95% CI for is (12, 18).  On the average, we infer that the maximum bench press increases by at least 12 pounds and at most 18 pounds, for an increase of 10 in the number of 60-pound bench presses. Example: Estimating the Slope for Predicting Maximum Bench Press

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.4 How the Data Vary Around the Regression Line

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 58 Residuals and Standardized Residuals A residual is a prediction error – the difference between an observed outcome and its predicted value.  The magnitude of these residuals depends on the units of measurement for y. A standardized version of the residual does not depend on the units.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 59 Standardized residual = The se formula is complex, so we rely on software to find it. A standardized residual indicates how many standard errors a residual falls from 0. If the relationship is truly linear and the standardized residuals have approximately a bell-shaped distribution, observations with standardized residuals larger than 3 in absolute value often represent outliers. Standardized Residuals

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 60 Example: Detecting an Underachieving College Student Data was collected on a sample of 59 students at the University of Georgia. Two of the variables were:  CGPA: College Grade Point Average  HSGPA: High School Grade Point Average

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 61 A regression equation was created from the data:  x: HSGPA  y: CGPA Equation: Example: Detecting an Underachieving College Student

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 62 Table 12.6 Observations with Large Standardized Residuals in Student GPA Regression Analysis, as Reported by MINITAB Example: Detecting an Underachieving College Student

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 63 Consider the reported standardized residual of  This indicates that the residual is 3.14 standard errors below 0.  This student’s actual college GPA is quite far below what the regression line predicts. Example: Detecting an Underachieving College Student

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 64 Analyzing Large Standardized Residuals Does it fall well away from the linear trend that the other points follow? Does it have too much influence on the results? Note: Some large standardized residuals may occur just because of ordinary random variability - even if the model is perfect, we’d expect about 5% of the standardized residuals to have absolute values > 2 by chance.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 65 Histogram of Residuals A histogram of residuals or standardized residuals is a good way of detecting unusual observations. A histogram is also a good way of checking the assumption that the conditional distribution of y at each x value is normal.  Look for a bell-shaped histogram.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 66 Suppose the histogram is not bell-shaped:  The distribution of the residuals is not normal. However….  Two-sided inferences about the slope parameter still work quite well.  The t-inferences are robust. Histogram of Residuals

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 67 The Residual Standard Deviation For statistical inference, the regression model assumes that the conditional distribution of y at a fixed value of x is normal, with the same standard deviation at each x. This standard deviation, denoted by, refers to the variability of y values for all subjects with the same x value.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 68 The estimate of, obtained from the data, is called the residual standard deviation: The Residual Standard Deviation

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 69 Example: Variability of the Athletes’ Strengths From MINITAB output, we obtain s, the residual standard deviation of y: For any given x value, we estimate the mean y value using the regression equation and we estimate the standard deviation using s = 8.0.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 70 Confidence Interval for We can estimate, the population mean of y at a given value of x by: We can construct a 95% confidence interval for using:  where the t-score has

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 71 Prediction Interval for y The estimate for the mean of y at a fixed value of x is also a prediction for an individual outcome y at the fixed value of x. Most regression software will form this interval within which an outcome y is likely to fall.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 72 Prediction Interval for y vs. Confidence Interval for The prediction interval for y is an inference about where individual observations fall.  Use a prediction interval for y if you want to predict where a single observation on y will fall for a particular x value.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 73 The confidence interval for is an inference about where a population mean falls.  Use a confidence interval for if you want to estimate the mean of y for all individuals having a particular x value. Prediction Interval for y vs. Confidence Interval for

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 74 Note that the prediction interval is wider than the confidence interval - you can estimate a population mean more precisely than you can predict a single observation. Caution: In order for these intervals to be valid, the true relationship must be close to linear with about the same variability of y-values at each fixed x-value. Prediction Interval for y vs. Confidence Interval for

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 75 Maximum Bench Press and Estimating its Mean Table 12.7 MINITAB Output for Confidence Interval (CI) and Prediction Interval (PI) on Maximum Bench Press for Athletes Who Do Eleven 60-Pound Bench Presses before Fatigue.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 76 Use the MINITAB output to find and interpret a 95% CI for the population mean of the maximum bench press values for all female high school athletes who can do x = 11 sixty-pound bench presses. For all female high school athletes who can do 11 sixty-pound bench presses, we estimate the mean of their maximum bench press values falls between 78 and 82 pounds. Maximum Bench Press and Estimating its Mean

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 77 Use the MINITAB output to find and interpret a 95% Prediction Interval for a single new observation on the maximum bench press for a randomly chosen female high school athlete who can do x = 11 sixty-pound bench presses. For all female high school athletes who can do 11 sixty-pound bench presses, we predict that 95% of them have maximum bench press values between 64 and 96 pounds. Maximum Bench Press and Estimating its Mean

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 12 Analyzing the Association Between Quantitative Variables: Regression Analysis Section 12.5 Exponential Regression: A Model for Nonlinearity

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 79 Nonlinear Regression Models If a scatterplot indicates substantial curvature in a relationship, then equations that provide curvature are needed.  Occasionally a scatterplot has a parabolic appearance: as x increases, y increases then it goes back down.  More often, y tends to continually increase or continually decrease but the trend shows curvature.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 80 Example: Exponential Growth in Population Size Since 2000, the population of the U.S. has been growing at a rate of 2% a year.  The population size in 2010 was 309 million  The population size in 2011 was 309 x 1.02  The population size in 2012 was 309 x (1.02) 2  …  The population size in 2020 is estimated to be  309 x (1.02) 10  This is called exponential growth

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 81 An exponential regression model has the formula:  For the mean of y at a given value of x, where and are parameters. Exponential Regression Model

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 82 In the exponential regression equation, the explanatory variable x appears as the exponent of a parameter. The mean and the parameter can take only positive values. Exponential Regression Model

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 83  As x increases, the mean increases when  It continually decreases when Figure The Exponential Regression Curve for. Question: Why does decrease if, even though ? Exponential Regression Model

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 84 For exponential regression, the logarithm of the mean is a linear function of x. When the exponential regression model holds, a plot of the log of the y values versus x should show an approximate straight-line relation with x. Exponential Regression Model

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 85 Example: Explosion in Number of Facebook Users Table 12.9 Number of Facebook Users Worldwide (in Millions).

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 86 Figure Plot of Number of Facebook Users (millions) from December 2004 to June Example: Explosion in Number of Facebook Users

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 87 Figure Plot of Log of Number of Facebook Users Between 2004 and When the log of the response has an approximate straight-line relationship with the explanatory variable, the exponential regression model is appropriate. Example: Explosion in Number of Facebook Users

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 88 Using regression software, we can create the exponential regression equation:  x: the number of days since December 1, Start with x = 0 for 12/1/2004, then x=1 for 12/2/2004, etc.  y: number of internet users  Equation: Example: Explosion in Number of Facebook Users

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 89 Interpreting Exponential Regression Models In the exponential regression model,  the parameter represents the mean value of y when x = 0;  the parameter represents the multiplicative effect on the mean of y for a one-unit increase in x.

Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 90 In this model: The predicted number of Facebook users on 12/1/2004 (for which x = 0) is million. The predicted number of Internet users on 12/1/2015 is times which is approximately 120 billion people. Example: Explosion in Number of Facebook Users