Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young.

Slides:



Advertisements
Similar presentations
Lesson 10: Linear Regression and Correlation
Advertisements

7.1 Seeking Correlation LEARNING GOAL
Review ? ? ? I am examining differences in the mean between groups
Statistics lecture 4 Relationships Between Measurement Variables.
Correlation & Regression Chapter 10. Outline Section 10-1Introduction Section 10-2Scatter Plots Section 10-3Correlation Section 10-4Regression Section.
Chapter Thirteen McGraw-Hill/Irwin © 2006 The McGraw-Hill Companies, Inc., All Rights Reserved. Linear Regression and Correlation.
Chapter 4 The Relation between Two Variables
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. Relationships Between Quantitative Variables Chapter 5.
Section 7.2 ~ Interpreting Correlations Introduction to Probability and Statistics Ms. Young ~ room 113.
Section 7.1 ~ Seeking Correlation
CORRELATON & REGRESSION
Describing the Relation Between Two Variables
Correlation A correlation exists between two variables when one of them is related to the other in some way. A scatterplot is a graph in which the paired.
Chapter 7 Scatterplots, Association, Correlation Scatterplots and correlation Fitting a straight line to bivariate data © 2006 W. H. Freeman.
Statistics for the Behavioral Sciences (5th ed.) Gravetter & Wallnau
Correlation and Linear Regression Chapter 13 McGraw-Hill/Irwin Copyright © 2012 by The McGraw-Hill Companies, Inc. All rights reserved.
Correlation and Linear Regression
McGraw-Hill/Irwin Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 13 Linear Regression and Correlation.
Linear Regression and Correlation
Linear Regression.
Correlation Scatter Plots Correlation Coefficients Significance Test.
Linear Regression and Correlation
Correlation and Regression David Young Department of Statistics and Modelling Science, University of Strathclyde Royal Hospital for Sick Children, Yorkhill.
Relationships Scatterplots and correlation BPS chapter 4 © 2006 W.H. Freeman and Company.
Math 2: Unit 6 Day 1 How do we use scatter plots, correlation, and linear regression?
Chapter 13 Statistics © 2008 Pearson Addison-Wesley. All rights reserved.
Scatterplots. Learning Objectives By the end of this lecture, you should be able to: – Describe what a scatterplot is – Be comfortable with the terms.
LECTURE UNIT 7 Understanding Relationships Among Variables Scatterplots and correlation Fitting a straight line to bivariate data.
© 2008 Pearson Addison-Wesley. All rights reserved Chapter 1 Section 13-6 Regression and Correlation.
Wednesday, October 12 Correlation and Linear Regression.
Scatter Graphs Teach GCSE Maths x x x x x x x x x x Weight and Length of Broad Beans Length (cm) 3 1·5 Weight (g) 0·5 1.
Production Planning and Control. A correlation is a relationship between two variables. The data can be represented by the ordered pairs (x, y) where.
Elementary Review over GRAPHS!!! Seriously…students seem to forget this stuff. Outcome 5, Component 2.
Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Check roster below the chat area for your name to be sure you get credit! Audio will start at class time. Previously requested topics will be gone over.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or Re-Enter.
3.3 Correlation: The Strength of a Linear Trend Estimating the Correlation Measure strength of a linear trend using: r (between -1 to 1) Positive, Negative.
Correlation and Regression: The Need to Knows Correlation is a statistical technique: tells you if scores on variable X are related to scores on variable.
Section 2.6 – Draw Scatter Plots and Best Fitting Lines A scatterplot is a graph of a set of data pairs (x, y). If y tends to increase as x increases,
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
Chapter 9: Correlation and Regression Analysis. Correlation Correlation is a numerical way to measure the strength and direction of a linear association.
Found StatCrunch Resources
1 Virtual COMSATS Inferential Statistics Lecture-25 Ossam Chohan Assistant Professor CIIT Abbottabad.
1 Data Analysis Linear Regression Data Analysis Linear Regression Ernesto A. Diaz Department of Mathematics Redwood High School.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
Linear Regression and Correlation Chapter GOALS 1. Understand and interpret the terms dependent and independent variable. 2. Calculate and interpret.
Copyright (C) 2002 Houghton Mifflin Company. All rights reserved. 1 Understandable Statistics Seventh Edition By Brase and Brase Prepared by: Lynn Smith.
ContentDetail  Two variable statistics involves discovering if two variables are related or linked to each other in some way. e.g. - Does IQ determine.
©The McGraw-Hill Companies, Inc. 2008McGraw-Hill/Irwin Linear Regression and Correlation Chapter 13.
Chapter 7 found in Unit 5 Correlation & Causality Section 1: Seeking Correlation Page 286 Can't Type? press F11 Can’t Hear? Check: Speakers, Volume or.
4.5 – Analyzing Lines of Best Fit Today’s learning goal is that students will be able to Use residuals to determine how well lines of fit model data. Distinguish.
Copyright © Cengage Learning. All rights reserved. 8 9 Correlation and Regression.
Correlation & Linear Regression Using a TI-Nspire.
Some Reminders: Check the Roster below the chat area to make sure you are listed, especially if it says you left! Audio starts on the hour. Active on-topic.
Copyright © Cengage Learning. All rights reserved.
Welcome to the Unit 5 Seminar Kristin Webster
Statistical analysis.
Regression and Correlation
Statistical analysis.
7.3 Best-Fit Lines and Prediction
Elementary Statistics
7.3 Best-Fit Lines and Prediction
Correlation and Causality
Linear Regression and Correlation
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Section 7.3 ~ Best-Fit Lines and Prediction Introduction to Probability and Statistics Ms. Young

Objective Sec. 7.3 After this section you will become familiar with the concept of a best-fit line for a correlation, recognize when such lines have predictive value and when they may not, understand how the square of the correlation coefficient is related to the quality of the fit, and qualitatively understand the use of multiple regression.

Line of Best-Fit The best-fit line (or regression line) on a scatterplot is a line that lies closer to the data points than any other possible line  This can be useful to make predictions based on existing data  The line of best-fit should have approximately the same number of points above it as it has below it and it does not have to start at the origin  The precise line of best-fit can be calculated by hand, but is very tedious so often times it is estimated “by eye” or by using a calculator Sec. 7.3

Cautions in Making Predictions from Best-Fit Lines 1. Don’t expect a best-fit line to give a good prediction unless the correlation is strong and there are many data points  If the sample points lie very close to the best-fit line, the correlation is very strong and the prediction is more likely to be accurate  If the sample points lie away from the best-fit line by substantial amounts, the correlation is weak and predictions tend to be much less accurate Sec. 7.3

Cautions in Making Predictions from Best-Fit Lines 2. Don’t use a best-fit line to make predictions beyond the bounds of the data points to which the line was fit  Ex. ~ The diagram below represents the relationship between candle length and burning time. The data that was collected dealt with candles that all fall between 2 in. and 4 in. Using the line of best fit to make a prediction far off from these lengths would most likely be inappropriate. According to the line of best-fit, a candle with a length of 0 in. burns for 2 minutes, an impossibility Sec. 7.3

Cautions in Making Predictions from Best-Fit Lines 3. A best-fit line based on past data is not necessarily valid now and might not result in valid predictions of the future  Ex. ~ Economists studying historical data found a strong correlation between unemployment and the rate of inflation. According to this correlation, inflation should have risen dramatically in the recent years when the unemployment rate fell below 6%. But inflation remained low, showing that the correlation from old data did not continue to hold. 4. Don’t make predictions about a population that is different from the population from which the sample data were drawn  Ex. ~ you cannot expect that the correlation between aspirin consumption and heart attacks in an experiment involving only men will also apply to women 5. Remember that a best-fit line is meaningless when there is no significant correlation or when the relationship is nonlinear  Ex. ~ there is no correlation between shoe size and IQ, so even though you can draw a line of best-fit, it is useless in making any conclusions Sec. 7.3

Example 1 State whether the prediction (or implied prediction) should be trusted in each of the following cases, and explain why or why not. You’ve found a best-fit line for a correlation between the number of hours per day that people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories per day.  This prediction would be beyond the bounds of the data collected and should therefore not be trusted There is a well-known but weak correlation between SAT scores and college grades. You use this correlation to predict the college grades of your best friend from her SAT scores.  Since the correlation is weak, that means that there is much scatter in the data and you should not expect great accuracy in the prediction Historical data have shown a strong negative correlation between birth rates in Russia and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia.  We cannot automatically assume that the historical data still apply today. In fact, Russia currently has a very low birth rate, despite also having a low level of affluence. Sec. 7.3

Example 1 Cont’d… A study in China has discovered correlations that are useful in designing museum exhibits that Chinese children enjoy. A curator suggests using this information to design a new museum exhibit for Atlanta-area school children.  The suggestion to use information from the Chinese study for an Atlanta exhibit assumes that predictions made from correlations in China also apply to Atlanta. However, given the cultural differences between China and Atlanta, the curator’s suggestion should not be considered without more information to back it up. Scientific studies have shown a very strong correlation between children’s ingesting of lead and mental retardation. Based on this correlation, paints containing lead were banned  Given the strength of the correlation and the severity of the consequences, this prediction and the ban that followed seem quite reasonable. In fact, later studies established lead as an actual cause of mental retardation, making the rationale behind the ban even stronger. Sec. 7.3

The Correlation Coefficient and Best-Fit Lines Recall that the correlation coefficient (r) refers to the strength of a correlation The correlation coefficient can also be used to say something about the validity of predictions with best-fit lines  The coefficient of determination, r², is the proportion of the variation in a variable that is accounted for by the best-fit line Ex. ~ The correlation coefficient for the diamond weight and price from the scatterplot on p.307 is r = 0.777, so r²≈ This means that about 60% of the variation in the diamond prices is accounted for by the best-fit line relating weight and price and 40% of the variation in price must be due to other factors. Sec. 7.3

Example 2 You are the manager of a large department store. Over the years, you’ve found a reasonably strong positive correlation between your September sales and the number of employees you’ll need to hire for peak efficiency during the holiday season. The correlation coefficient is This year your September sales are fairly strong. Should you start advertising for help based on the best-fit line?  r²= 0.903, which means that 90% of the variation in the number of peak employees can be accounted for by a linear relationship with September sales, leaving only 10% unaccounted for  Because 90% is so high, it is a good idea to predict the number of employees you’ll need using the best-fit line Sec. 7.3

Multiple Regression Multiple regression is a technique that allows us to find a best-fit equation relating one variable to more than one other variable  Ex. ~ Price of diamonds in comparison to carat, cut, clarity, and color The coefficient of determination (R²) is the most common measure in a multiple regression  This tells us how much of the scatter in the data is accounted for by the best-fit equation If R²is close to 1, the best-fit equation should be very useful for making predictions within the range of the data If R²is close to 0, the predictions are essentially useless Sec. 7.3