Statistical Reasoning for everyday life Intro to Probability and Statistics Mr. Spering – Room 113
7.3 Regression Lines and Predictions Correlation… Correlation exists between two variables when higher values of one variable consistently go with higher values of another or when higher values of one variable consistently go with lower values of another. EXPLAIN THE CORRELTION The time traveled and the distance traveled. POSITIVE The cell phone minutes and bill amount ($). POSITIVE Wrist circumference and height.. POSITIVE Eye color and automobile color. NO CORRELATION
7.3 Regression Lines and Predictions Scatter Plots…Scatter Plot is a graph in which each point corresponds to the values of two variables…{ not necessarily independent and dependent} The best-fit line (or regression line) on a scatter plot is a line that lies closer to the data points than any other possible line (according to the standard statistical measure of closeness).
7.3 Regression Lines and Predictions Scatter Plots…Scatter Plot is a graph in which each point corresponds to the values of two variables…{ not necessarily independent and dependent} Correlations with lines of best fit can be used to make predictions, “However, it is a capital mistake to theorize before one has data.” --Arthur Conan Doyle-- What did he write?
7.3 Regression Lines and Predictions Correlation does not imply causality. Cautions in Making Predictions from Best-Fit Lines 1.Don’t expect a good prediction unless the correlation is significant. If the sample points lie very close to the regression line, the correlation is very strong and the prediction more likely to be accurate. If the sample points lie away from the line by substantial amounts, the correlation is weak and predictions tend to be less accurate. 2.Don’t use regression lines to make predictions beyond the bounds of the data points. 3.A best-fit line based on past data is not necessarily valid now and might not result in valid predictions. 4.Don’t make predictions about a population that is different from the population from which the sample data were drawn. BE SPECIFIC. 5.Remember a linear regression should not be used for predictions when there is no correlation, or when the relationship is non-linear.
7.3 Regression Lines and Predictions Scatter Plots…Scatter Plot is a graph in which each point corresponds to the values of two variables…{ not necessarily independent and dependent} Best-fit lines and r 2 (coefficient of determination): The square of the correlation coefficient or r 2, is the proportion of the variation in a variable that can be explained by the best-fit line. PREDICTION?
7.3 Regression Lines and Predictions Correlation does not imply causality. VALID PREDICTIONS… STATE WHETHER THE PREDICTION SHOULD BE TRUSTED. EXPLAIN. 1.You’ve found a correlation between the number of hours per day people exercise and the number of calories they consume each day. You’ve used this correlation to predict that a person who exercises 18 hours per day would consume 15,000 calories. 1.No one exercises for 18 hours a day, beyond the domain and range of the data. NOT TRUSTED 2. There is a well known weak correlation between SAT scores and college grades. You use this to predict your college math grade. 2. The fact that the correlation is weak makes it hard to make specific predictions. Although, it might be statistically significant, it will not be very accurate. NOT TRUSTED
7.3 Regression Lines and Predictions VALID PREDICTIONS… STATE WHETHER THE PREDICTION SHOULD BE TRUSTED. EXPLAIN. 3.Historical data has shown a strong negative correlation between national birth rates and affluence. That is, countries with greater affluence tend to have lower birth rates. These data predict a high birth rate in Russia. 3. We should not necessarily assume that historical data will apply today. In fact, Russia had a very low birth rate throughout the 1990’s, despite a low level of affluence. NOT TRUSTED 4.Based on a large data set, you’ve made a scatter plot for salsa consumption (per person) versus years of education. The diagram shows no correlation, but your calculator has give a line of regression regardless. The line predicts that someone who consumes 1 pint of salsa per week has at least 13 years of education. 4. No correlations, therefore meaningless findings. NOT TRUSTED
7.3 Regression Lines and Predictions Correlation does not imply causality. MULTIPLE REGRESSION… The use of multiple regression allows the calculation of a best-fit equation that represents the best fit between one variable (such as price) and a combination of two or more other variables (such as weight and color). The coefficient of determination, r 2, tell us the proportion of the scatter in the data accounted for by the best-fit equation. Be patient more to come on multiple regression, there are no secrets in mathematics.
HOMEWORK: pg 314 # 1-9 all, #13