Prediction concerning the response Y. Where does this topic fit in? Model formulation Model estimation Model evaluation Model use.

Slides:



Advertisements
Similar presentations
Objectives 10.1 Simple linear regression
Advertisements

Happiness comes not from material wealth but less desire. 1.
Diagnostics – Part I Using plots to check to see if the assumptions we made about the model are realistic.
Simultaneous inference Estimating (or testing) more than one thing at a time (such as β 0 and β 1 ) and feeling confident about it …
Simple Linear Regression. G. Baker, Department of Statistics University of South Carolina; Slide 2 Relationship Between Two Quantitative Variables If.
EPI 809/Spring Probability Distribution of Random Error.
Objectives (BPS chapter 24)
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
Comparing Two Population Means The Two-Sample T-Test and T-Interval.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Simple Linear Regression Basic Business Statistics 11 th Edition.
Linear Regression and Correlation Analysis
Chapter Topics Confidence Interval Estimation for the Mean (s Known)
Sample Size Determination In the Context of Hypothesis Testing
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Let sample from N(μ, σ), μ unknown, σ known.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Statistical Intervals Based on a Single Sample.
Chapter 12 Section 1 Inference for Linear Regression.
Descriptive measures of the strength of a linear association r-squared and the (Pearson) correlation coefficient r.
Inference for regression - Simple linear regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
Linear Regression Inference
Introduction to Statistical Inferences
Simple linear regression Linear regression with one predictor variable.
Copyright © Cengage Learning. All rights reserved. 12 Simple Linear Regression and Correlation.
Prediction concerning Y variable. Three different research questions What is the mean response, E(Y h ), for a given level, X h, of the predictor variable?
Model Checking Using residuals to check the validity of the linear regression model assumptions.
Linear Lack of Fit (LOF) Test An F test for checking whether a linear regression function is inadequate in describing the trend in the data.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Review of Statistical Models and Linear Regression Concepts STAT E-150 Statistical Methods.
CHAPTER 14 MULTIPLE REGRESSION
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Introduction to Linear Regression
A Broad Overview of Key Statistical Concepts. An Overview of Our Review Populations and samples Parameters and statistics Confidence intervals Hypothesis.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 10 Correlation and Regression
Confidence intervals for the mean - continued
Topic 10 - Linear Regression Least squares principle - pages 301 – – 309 Hypothesis tests/confidence intervals/prediction intervals for regression.
Introduction to Probability and Statistics Thirteenth Edition Chapter 12 Linear Regression and Correlation.
An alternative approach to testing for a linear association The Analysis of Variance (ANOVA) Table.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Confidence Interval Estimation for a Population Proportion Lecture 31 Section 9.4 Wed, Nov 17, 2004.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.
Agresti/Franklin Statistics, 1 of 88  Section 11.4 What Do We Learn from How the Data Vary Around the Regression Line?
Diagnostics – Part II Using statistical tests to check to see if the assumptions we made about the model are realistic.
Section 6-3 Estimating a Population Mean: σ Known.
Introduction Studying the normal curve in previous lessons has revealed that normal data sets hover around the average, and that most data fits within.
Chapter 8 Minitab Recipe Cards. Confidence intervals for the population mean Choose Basic Statistics from the Stat menu and 1- Sample t from the sub-menu.
A review of key statistical concepts. An overview of the review Populations and parameters Samples and statistics Confidence intervals Hypothesis testing.
Correlation & Regression Analysis
A first order model with one binary and one quantitative predictor variable.
Regression through the origin
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
9-1 ESTIMATION Session Factors Affecting Confidence Interval Estimates The factors that determine the width of a confidence interval are: 1.The.
Inference for  0 and 1 Confidence intervals and hypothesis tests.
Chapter 22 Comparing Two Proportions. Comparing 2 Proportions How do the two groups differ? Did a treatment work better than the placebo control? Are.
Week 111 Review - Sum of Normal Random Variables The weighted sum of two independent normally distributed random variables has a normal distribution. Example.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Confidence Intervals. Point Estimate u A specific numerical value estimate of a parameter. u The best point estimate for the population mean is the sample.
In your groups, go through all four steps of a confidence interval: The “Country Taste” bread making company wants to estimate the actual weight of their.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Simple linear regression. What is simple linear regression? A way of evaluating the relationship between two continuous variables. One variable is regarded.
Inference for Regression
Simple Linear Regression
Chapter 13 Simple Linear Regression
Chapter 13 Simple Linear Regression
Presentation transcript:

Prediction concerning the response Y

Where does this topic fit in? Model formulation Model estimation Model evaluation Model use

Translating two research questions into two reasonable statistical answers What is the mean weight, μ, of all American women, aged 18-24? –If we want to estimate μ, what would be a good estimate? What is the weight, y, of a randomly selected American woman, aged 18-24? –If we want to predict y, what would be a good prediction?

Could we do better by taking into account a person’s height?

One thing to estimate (μ y ) and one thing to predict (y)

Two different research questions What is the mean response μ Y when the predictor value is x h ? What value will a new observation Y new be when the predictor value is x h ?

Example: Skin cancer mortality and latitude What is the expected (mean) mortality rate for all locations at 40 o N latitude? What is the predicted mortality rate for 1 new randomly selected location at 40 o N?

Example: Skin cancer mortality and latitude

“Point estimators” is the best answer to each research question. That is, it is: the best guess of the mean response at x h the best guess of a new observation at x h But, as always, to be confident in the answer to our research question, we should put an interval around our best guess.

It is dangerous to “extrapolate” beyond scope of model.

A confidence interval for the population mean response μ Y … when the predictor value is x h

Again, what are we estimating?

(1-α)100% t-interval for mean response μ Y Formula in notation: Formula in words: Sample estimate ± (t-multiplier × standard error)

Example: Skin cancer mortality and latitude Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (144.56, ) (111.23,188.93) Values of Predictors for New Observations New Obs Lat

Factors affecting the length of the confidence interval for μ Y As the confidence level decreases, … As MSE decreases, … As the sample size increases, … The more spread out the predictor values, … The closer x h is to the sample mean, …

Does the estimate of μ Y when x h = 1 vary more here …? Var N StDev yhat(x=1)

… or here? Var N StDev yhat(x=1)

Does the estimate of μ Y vary more when x h = 1 or when x h = 5.5? Var N StDev yhat(x=1) yhat(x=5.5)

Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (144.6,155.6) (111.2,188.93) (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the center Values of Predictors for New Observations New Obs Latitude Mean of Lat = Example: Skin cancer mortality and latitude

When is it okay to use the confidence interval for μ Y formula? When x h is a value within the scope of the model – x h does not have to be one of the actual x values in the data set. When the “LINE” assumptions are met. –The formula works okay even if the error terms are only approximately normal. –If you have a large sample, the error terms can even deviate substantially from normality.

Prediction interval for a new response Y new

Again, what are we predicting?

(1-α)100% prediction interval for new response Y new Formula in notation: Formula in words: Sample prediction ± (t-multiplier × standard error)

Example: Skin cancer mortality and latitude Predicted Values for New Observations New Obs Fit SE Fit 95.0% CI 95.0% PI (144.56, ) (111.23,188.93) Values of Predictors for New Observations New Obs Lat

When is it okay to use the prediction interval for Y new formula? When x h is a value within the scope of the model – x h does not have to be one of the actual x values in the data set. When the “LINE” assumptions are met. –The formula for the prediction interval depends strongly on the assumption that the error terms are normally distributed.

What’s the difference in the two formulas? Confidence interval for μ Y : Prediction interval for Y new :

Prediction of Y new if the mean μ Y is known Suppose it were known that the mean skin cancer mortality at x h = 40 o N is 150 deaths per million (with variance 400)? What is the predicted skin cancer mortality in Columbus, Ohio?

And then reality sets in The mean μ Y is not known. – Estimate it with the predicted response – The cost of usingto estimate μ Y is the The variance σ 2 is not known. variance of – Estimate it with MSE.

Variance of the prediction which is estimated by: The variation in the prediction of a new response depends on two components: 1. the variation due to estimating the mean μ Y with 2. the variation in Y

What’s the effect of the difference in the two formulas? Confidence interval for μ Y : Prediction interval for Y new :

What’s the effect of the difference in the two formulas? A (1-α)100% confidence interval for μ Y at x h will always be narrower than a (1-α)100% prediction interval for Y new at x h. The confidence interval’s standard error can approach 0, whereas the prediction interval’s standard error cannot get close to 0.

Confidence intervals and prediction intervals for response in Minitab Stat >> Regression >> Regression … Specify response and predictor(s). Select Options… –In “Prediction intervals for new observations” box, specify either the X value or a column name containing multiple X values. –Specify confidence level (default is 95%). Click on OK. Results appear in session window.

Confidence intervals and prediction intervals for response in Minitab

C

Predicted Values for New Observations New Fit SE Fit 95.0% CI 95.0% PI (144.6,155.6) (111.2,188.93) (206.9,236.8) (180.6,263.07)X X denotes a row with X values away from the center Values of Predictors for New Observations New Obs Latitude Mean of Lat = Example: Skin cancer mortality and latitude

A plot of the confidence interval and prediction interval in Minitab Stat >> Regression >> Fitted line plot … Specify predictor and response. Under Options … –Select Display confidence bands. –Select Display prediction bands. –Specify desired confidence level (95% default) Select OK.

A plot of the confidence interval and prediction interval in Minitab