Copyright © 2010 Pearson Education, Inc. Slide 8 - 1 The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted.

Slides:



Advertisements
Similar presentations
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Advertisements

Copyright © 2010 Pearson Education, Inc. Slide The number of sweatshirts a vendor sells daily has the following probability distribution. Num of.
Copyright © 2010 Pearson Education, Inc. Slide
Chapter 10 Correlation and Regression
Simple Linear Regression Analysis
Multiple Regression and Model Building
Residuals.
Chapter 8 Linear Regression
Linear Regression.  The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu:  The model won’t be perfect, regardless.
Chapter 8 Linear regression
Chapter 8 Linear regression
Linear Regression Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Slide
Residuals Revisited.   The linear model we are using assumes that the relationship between the two variables is a perfect straight line.  The residuals.
Chapter 8 Linear Regression.
Extrapolation: Reaching Beyond the Data
Warm up Use calculator to find r,, a, b. Chapter 8 LSRL-Least Squares Regression Line.
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
Statistics Residuals and Regression. Warm-up In northern cities roads are salted to keep ice from freezing on the roadways between 0 and -9.5 ° C. Suppose.
CHAPTER 8: LINEAR REGRESSION
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8: Linear Regression
Chapter 8: Linear Regression
Scatterplots, Association, and Correlation Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Copyright © 2010 Pearson Education, Inc. Slide Lauren is enrolled in a very large college calculus class. On the first exam, the class mean was a.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression.
Slide 8- 1 Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Objective: Understanding and using linear regression Answer the following questions: (c) If one house is larger in size than another, do you think it affects.
Relationships If we are doing a study which involves more than one variable, how can we tell if there is a relationship between two (or more) of the.
Copyright © 2010 Pearson Education, Inc. Slide A least squares regression line was fitted to the weights (in pounds) versus age (in months) of a.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 8 Linear Regression.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Chapter 8 Linear Regression. Slide 8- 2 Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 8 Linear Regression (3)
Chapter 8 Linear Regression
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Chapter 8 Linear Regression HOW CAN A MODEL BE CREATED WHICH REPRESENTS THE LINEAR RELATIONSHIP BETWEEN TWO QUANTITATIVE VARIABLES?
Copyright © 2010 Pearson Education, Inc. Chapter 7 Scatterplots, Association, and Correlation.
Chapter 8 Linear Regression. Objectives & Learning Goals Understand Linear Regression (linear modeling): Create and interpret a linear regression model.
CHAPTER 8 Linear Regression. Residuals Slide  The model won’t be perfect, regardless of the line we draw.  Some points will be above the line.
Linear Regression Day 1 – (pg )
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 8- 1.
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 8 Linear Regression. Fat Versus Protein: An Example 30 items on the Burger King menu:
Linear Regression Chapter 8. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression.
Copyright © 2010 Pearson Education, Inc. Chapter 3 Scatterplots, Correlation and Least Squares Regression Slide
Copyright © 2009 Pearson Education, Inc. Chapter 8 Linear Regression.
1-1 Copyright © 2015, 2010, 2007 Pearson Education, Inc. Chapter 7, Slide 1 Chapter 7 Linear Regression.
Statistics 8 Linear Regression. Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King.
Honors Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line.
Part II Exploring Relationships Between Variables.
AP Statistics Chapter 8 Linear Regression. Objectives: Linear model Predicted value Residuals Least squares Regression to the mean Regression line Line.
Bell Ringer CountryMarijuana (%) Other Drugs (%) Czech Rep.224 Denmark173 England4021 Finland51 Ireland3716 Italy198 Ireland2314 Norway63 Portugal73 Scotland5331.
Training Activity 4 (part 2)
Finding the Best Fit Line
Chapter 8 Linear Regression.
Chapter 7 Linear Regression.
Chapter 8 Linear Regression Copyright © 2010 Pearson Education, Inc.
Regression and Residual Plots
Chapter 8 Part 2 Linear Regression
^ y = a + bx Stats Chapter 5 - Least Squares Regression
Chapter 8 Part 1 Linear Regression
CHAPTER 3 Describing Relationships
Presentation transcript:

Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be: a b c. 0.0 d. 0.2 e. 0.8

Copyright © 2010 Pearson Education, Inc. Slide The correlation between two scores X and Y equals 0.8. If both the X scores and the Y scores are converted to z-scores, then the correlation between the z-scores for X and the z-scores for Y would be: a b c. 0.0 d. 0.2 e. 0.8

Copyright © 2010 Pearson Education, Inc. Chapter 8 Linear Regression

Copyright © 2010 Pearson Education, Inc. Slide Fat Versus Protein: An Example The following is a scatterplot of total fat versus protein for 30 items on the Burger King menu with a correlation of.83:

Copyright © 2010 Pearson Education, Inc. Slide The Linear Model The linear model (line of best fit, least squares line, regression line) is just an equation of a straight line through the data to show us how the values are associated. Using this line we will be able to predict values. Predicted values are denoted as: (also called y-hat). The hat tells you they are predicted values. The difference between the observed-value and the predicted-value is called the residual. residual = observed – predicted = y – y(hat)

Copyright © 2010 Pearson Education, Inc. Slide A negative residual means the predicted values too big (an overestimate). A positive residual means the predicted values too small (an underestimate). In the figure, the estimated fat of the BK Broiler chicken sandwich is 36 g, while the true value of fat is 25 g, the residual=?

Copyright © 2010 Pearson Education, Inc. Slide Best Fit Means Least Squares Some residuals are positive, others are negative, and, on average, they cancel each other out. To calculate how well the line fits the data we square the residuals (to eliminate the negatives) then find the sum of the squares. The smaller the sum, the better the fit. That is why another name is least squares line.

Copyright © 2010 Pearson Education, Inc. Slide If the variables are standardized (zscores or standard deviations): The equation of the line of best fit is: Correlation (also called r) is the same for x and y because it is standardized. Therefore:

Copyright © 2010 Pearson Education, Inc. Slide Example: A scatterplot of house prices vs. house size for houses shows a relationship that is straight, with only moderate scatter and no outliers. The correlation between house price and house size is a. You go to an open house and find the house is 1 standard deviation above the mean in size. What would you guess about its price? b. You read an add for a house priced 2 standard deviations below the mean. What would you guess about its size? c. A friend tells you about a house whose size in square meters (hes European) is 1.5 standard deviations above the mean. What would you guess about its size in square feet?

Copyright © 2010 Pearson Education, Inc. Slide Sometimes we are given the regression line in REAL UNITS!!! The regression line for the Burger King data fits the data well: The equation is Example: What is predicted fat content for a BK Broiler chicken sandwich (with 30 g of protein)?

Copyright © 2010 Pearson Education, Inc. Slide To find the regression line (in real units): You may be given the standard deviations, correlation and means. OR …You may be given raw data.

Copyright © 2010 Pearson Education, Inc. Slide First make sure a regression is appropriate: Since regression and correlation are closely related, we need to check the same conditions for regressions as we did for correlations: Quantitative Variables Condition Straight Enough Condition (look at scatterplot) Outlier Condition (look at scatterplot)

Copyright © 2010 Pearson Education, Inc. Slide To create the Regression Line in Real Units given the standard deviations, correlation (r), and means: You know the equation of a line. In Statistics we use a slightly different notation: We write b 1 and b 0 for the slope and intercept of the line. (slope is always in units of y per unit of x)

Copyright © 2010 Pearson Education, Inc. Slide To find a regression line (linear model) with raw data: Use your calculator! First, be sure to check: Quantitative Linear (scatterplot) No outliers (scatterplot). If it is not quantitative, not linear or it has outliers, you will NOT be able to model the data with a linear model.

Copyright © 2010 Pearson Education, Inc. Slide TI Tips: Equation of the Regression Line STAT, CALC, Choose LinReg(a + bx) (Not the first one … the second one … scroll down!) Specify that x and y are YR and TUIT (we put these in our calculator before.)

Copyright © 2010 Pearson Education, Inc. Slide TI Tips: Equation of the Regression Line Graphed on the Scatterplot STAT, CALC, Choose LinReg(a + bx) (Not the first one … the second one … scroll down!) Specify that x and y are YR and TUIT (we put these in our calculator before.) We want the screen to say: LinReg(a+bx) YR, TUIT, Y1 (this will send the equation to Y1 and then we will see it on our graph) To add Y1 to the end: VARS, Y-VARS, 1:Function and choose Y1 ENTER See the equation. It has also been placed in Y1. Hit GRAPH.

Copyright © 2010 Pearson Education, Inc. Slide Example: Using the relationship between house price (in thousands of dollars) and house size (in thousands of square feet) the regression model is: a. What is the slope and what does it mean? b. What are the units of the slope? c. Your house is 2000 square feet bigger than your neighbors house. How much more do you expect it to be worth? d. Is the y-intercept of meaningful, explain?

Copyright © 2010 Pearson Education, Inc. Slide Example: The linear model relating hurricanes wind speeds to their central pressures was: Predicted MaxWindSpeed = (.897)CentralPressure Hurricane Katrina had a central pressure measured at 920 millibars. What does our regression model predict for her maximum wind speed? How good is that prediction, given that Katrinas actual wind speed was measured at 110 knots?

Copyright © 2010 Pearson Education, Inc. Slide More about Residuals A scatterplot of all the residuals the graph should be completely random! It should show no bends and should have no outliers.

Copyright © 2010 Pearson Education, Inc. Slide Draw examples of a residual graph that is not random.

Copyright © 2010 Pearson Education, Inc. Slide TI Tips – Residual Plots You look at the scatterplot to make sure it is linear. Sometimes it is hard to tell. After you do a regression do a residual plot. If the residual plot is completely random, you know your scatterplot was linear. The calculator automatically stores the residuals in a list named RESID after you run a regression. To look at them … STAT EDIT cursor over to RESID. To create the residual plot … STAT PLOT, Plot2, Xlist:YR and Ylist: RESID Y= may still have your regression line in it. You can either turn it off or remove it. ZoomStat Do you see a curve?

Copyright © 2010 Pearson Education, Inc. Slide Example: Our linear model for homes uses the model: predicted price = (94.454)(size) a. Would you prefer to find a home with a negative or a positive residual? Explain. b. You plan to look for a home of about 3000 square feet. How much should you expect to have to pay? c. You find a nice home that size selling for $300,000. Whats the residual?

Copyright © 2010 Pearson Education, Inc. Slide The Residual Standard Deviation The standard deviation of the residuals, s e, measures how much the points spread around the regression line. S e = Errors in predictions based on this model have a standard deviation of s (standard deviation in y units). We estimate the SD of the residuals using:

Copyright © 2010 Pearson Education, Inc. Slide R 2 The Variation Accounted For (cont.) All regression analyses include this statistic, although by tradition, it is written R 2 (pronounced R-squared). An R 2 of 0 means that none of the variance in the data is in the model; all of it is still in the residuals. When interpreting a regression model you need to Tell what R 2 means. The % of variability in y that is explained by x is R 2 R 2 is always between 0% and 100%. What makes a good R 2 value depends on the kind of data you are analyzing and on what you want to do with it. Always report slope and intercept for a regression and R 2 so that readers can judge for themselves how successful the regression is at fitting the data.

Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions Quantitative Variables Condition: Regression can only be done on two quantitative variables (and not two categorical variables). Straight Enough Condition: The linear model assumes that the relationship between the variables is linear. (check by scatterplot)

Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) If the scatterplot is not straight enough, stop here. You can only use a linear model on two variables that are related linearly! Some nonlinear relationships can be saved by re- expressing the data to make the scatterplot more linear.

Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Its a good idea to check linearity again after computing the regression when we can examine the residuals. Does the Plot Thicken? Condition: Residual plots should be scattered. Dont confuse this with Normal Probability Plots from unit one (to see if it is a normal curve) should be a straight line.

Copyright © 2010 Pearson Education, Inc. Slide Assumptions and Conditions (cont.) Outlier Condition: Watch out for outliers. Outlying points can dramatically change a regression model. Outliers can even change the sign of the slope, misleading us about the underlying relationship between the variables.

Copyright © 2010 Pearson Education, Inc. Slide What Can Go Wrong? Dont fit a straight line to a nonlinear relationship. Beware extraordinary points (y-values that stand off from the linear pattern or extreme x-values). Dont extrapolate beyond the datathe linear model may no longer hold outside of the range of the data. Dont infer that x causes y just because there is a good linear model for their relationship association is not causation. Dont choose a model based on R 2 alone.

Copyright © 2010 Pearson Education, Inc. Slide A few IMPORTANT things to remember: The percentage of variability in y that is explained by x is: r 2 (an example of this will be homework problem #7) Correlation = r = +/- squareroot of r 2 (you need to decide if it is + or – for a positive or negative correlation) residual = observed – predicted = y – y(hat) R 2 tells you how well the actual data fits the model (1 is perfect, zero is no correlation) 1 – r 2 is the fraction of the original variance left in the residuals Be careful not to use a regression to extrapolate (predict values beyond the scope/time frame of the model)

Copyright © 2010 Pearson Education, Inc. Slide Homework (Day 1) Pg odd (skip 9)