1.9 Comparing Two Data Sets. Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster (0.01875 m/year > 0.01353 m/year)

Slides:



Advertisements
Similar presentations
Section 10-3 Regression.
Advertisements

Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance.
Linear Regression (C7-9 BVD). * Explanatory variable goes on x-axis * Response variable goes on y-axis * Don’t forget labels and scale * Statplot 1 st.
Probabilistic & Statistical Techniques Eng. Tamer Eshtawi First Semester Eng. Tamer Eshtawi First Semester
From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
1. What is the probability that a randomly selected person is a woman who likes P.E.? 2. Given that you select a man, what is the probability that he likes.
Chapter 9 Regression Wisdom
LINEAR REGRESSION: What it Is and How it Works Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
LINEAR REGRESSION: What it Is and How it Works. Overview What is Bivariate Linear Regression? The Regression Equation How It’s Based on r.
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 6: Correlation.
Stat 512 – Lecture 17 Inference for Regression (9.5, 9.6)
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Correlation & Regression
Linear Regression.
Chapter 12-2 Transforming Relationships Day 2
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
1.4 Data in 2 Variables Definitions. 5.3 Data in 2 Variables: Visualizing Trends When data is collected over long period of time, it may show trends Trends.
Inferences for Regression
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
STAT E100 Section Week 3 - Regression. Review  Descriptive Statistics versus Hypothesis Testing  Outliers  Sample vs. Population  Residual Plots.
Unit 3.1 Scatter Plots & Line of Best Fit. Scatter Plots Scatter Plots are graphs of (X,Y) data They are constructed to show a mathematical relationship.
4.2 Introduction to Correlation Objective: By the end of this section, I will be able to… Calculate and interpret the value of the correlation coefficient.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Number of vacations in past 5 years
Regression. Population Covariance and Correlation.
 The equation used to calculate Cab Fare is y = 0.75x where y is the cost and x is the number of miles traveled. 1. What is the slope in this equation?
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 4 Section 2 – Slide 1 of 20 Chapter 4 Section 2 Least-Squares Regression.
Introduction to regression 3D. Interpretation, interpolation, and extrapolation.
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Chapter 8 Linear Regression *The Linear Model *Residuals *Best Fit Line *Correlation and the Line *Predicated Values *Regression.
Creating a Residual Plot and Investigating the Correlation Coefficient.
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
 Find the Least Squares Regression Line and interpret its slope, y-intercept, and the coefficients of correlation and determination  Justify the regression.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Residuals.
Simple Linear Regression The Coefficients of Correlation and Determination Two Quantitative Variables x variable – independent variable or explanatory.
Inference for Regression
Describing Relationships. Least-Squares Regression  A method for finding a line that summarizes the relationship between two variables Only in a specific.
Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.
CCSS.Math.Content.8.SP.A.1 Construct and interpret scatter plots for bivariate measurement data to investigate patterns of association between two quantities.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
 Understand how to determine a data point is influential  Understand the difference between Extrapolation and Interpolation  Understand that lurking.
Linear Regression Essentials Line Basics y = mx + b vs. Definitions
Linear Regression.
Model validation and prediction
Inferences for Regression
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Warm-Up . Math Social Studies P.E. Women Men 2 10
Regression and Residual Plots
Residuals Learning Target:
Least-Squares Regression
EQ: How well does the line fit the data?
Linear Models and Equations
Least-Squares Regression
Warm-Up 8/50 = /20 = /50 = .36 Math Social Studies P.E.
1.7 Nonlinear Regression.
Least-Squares Regression
CHAPTER 3 Describing Relationships
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
LG: I can assess the reliability of a linear model
Inferences for Regression
A medical researcher wishes to determine how the dosage (in mg) of a drug affects the heart rate of the patient. Find the correlation coefficient & interpret.
Chapters Important Concepts and Terms
Ch 9.
Warm-Up . Math Social Studies P.E. Women Men 2 10
Residuals From the Carnegie Foundation math.mtsac.edu/statway/lesson_3.3.1_version1.5A.
Presentation transcript:

1.9 Comparing Two Data Sets

Revisiting Go For the Gold! 3a) Whose slope is larger? The women’s is growing faster ( m/year > m/year)

Revisiting Go For the Gold! 3b) Residuals? The men’s residuals seem slightly more scattered The women’s residuals have more of a pattern

Revisiting Go For the Gold! 3b) r values? The women have a strong positive linear correlation (r = 0.83) The men have a weaker but still strong positive linear correlation (r = 0.70)

Revisiting Go For the Gold! 3b) r 2 values? 69% of the change in the women’s data is due to yearly increases (r 2 = 0.69) The men have a much less reliable fit; only 49% of the men’s increase is due to yearly increases. (r 2 = 0.49)

Revisiting Go For the Gold! 3b) Predicted values? Both predicted values for 2012 are off by more than 1 m! Neither model is extremely reliable, but the women’s seem to be a generally more reliable, although the model is worse in the women’s case.

Revisiting Go For the Gold! 3c) y- intercepts? In year 0, the men and women will be jumping backwards! Not much meaning Limits to predictive value of this model

Revisiting Go For the Gold! 3d) When will they jump equal distances? Find the year when their distances are the same. y = x – y = x – x – = x – x = x = x = The men and women will jump equal distances by the year Don’t wait up.

4. Comparing the two sets of data For each metre the men increase, the women increases by 0.95 m. When the men jump 0 m, the women jump -1.1 m. Backwards? Or just that the women’s distances will generally be less than the men’s – seems reasonable Remember we are comparing like quantities The y-intercept lowers the line

4. Comparing the two sets of data r = 0.81 Strong positive linear correlation r 2 = % of the change in the women’s distances is due to random fluctuations Residuals: Scattered, so linear fit is a good model. We could probably use it to predict women’s distances based on the men’s (or vice versa). Knowing the men’s distance in 2012 is 8.31 m, the women’s distance should be 6.8 m (actual is 7.12).

5. Effect of Outliers

Be careful to only remove the data from the men’s or women’s side Should we remove it at all? –Typo or other human error? –Is the sample representative of the population? –Is this merely a bad regression model?

5. Effect of Outliers Note that removing the outlier increases the correlation for the men’s model but decreases it for the women!

5. Effect of Outliers Consider the residuals. The linear model is still not a good one for this data. A logarithmic model is probably better.

5. Effect of Outliers The point is a likely outlier for the men’s data, but probably not for the women’s data

5. The Effect of Outliers Removing the outliers doesn’t affect the men vs women model too much Predicted value for women’s distance in 2012 is the same: 6.8 m