1.8 Critical Analysis: Outliers

Slides:



Advertisements
Similar presentations
Residuals.
Advertisements

Section 10-3 Regression.
Inference for Linear Regression (C27 BVD). * If we believe two variables may have a linear relationship, we may find a linear regression line to model.
Inference for Regression
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Slide Copyright © 2010 Pearson Education, Inc. Active Learning Lecture Slides For use with Classroom Response Systems Business Statistics First Edition.
Correlation and Regression. Relationships between variables Example: Suppose that you notice that the more you study for an exam, the better your score.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved Section 10-3 Regression.
Relationship of two variables
Linear Regression: Test scores vs. HW scores. Positive/Negative Correlation.
AP Statistics Section 15 A. The Regression Model When a scatterplot shows a linear relationship between a quantitative explanatory variable x and a quantitative.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
1 Chapter 10 Correlation and Regression 10.2 Correlation 10.3 Regression.
Chapter 10 Correlation and Regression
Notes Bivariate Data Chapters Bivariate Data Explores relationships between two quantitative variables.
Review Multiple Choice Regression: Chapters 7, 8, 9.
Basic Concepts of Correlation. Definition A correlation exists between two variables when the values of one are somehow associated with the values of.
Verbal SAT vs Math SAT V: mean=596.3 st.dev=99.5 M: mean=612.2 st.dev=96.1 r = Write the equation of the LSRL Interpret the slope of this line Interpret.
REGRESSION DIAGNOSTICS Fall 2013 Dec 12/13. WHY REGRESSION DIAGNOSTICS? The validity of a regression model is based on a set of assumptions. Violation.
Examining Bivariate Data Unit 3 – Statistics. Some Vocabulary Response aka Dependent Variable –Measures an outcome of a study Explanatory aka Independent.
Correlations: Relationship, Strength, & Direction Scatterplots are used to plot correlational data – It displays the extent that two variables are related.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Chapter 10 Correlation and Regression 10-2 Correlation 10-3 Regression.
GOAL: I CAN USE TECHNOLOGY TO COMPUTE AND INTERPRET THE CORRELATION COEFFICIENT OF A LINEAR FIT. (S-ID.8) Data Analysis Correlation Coefficient.
MATH 2311 Section 5.4. Residuals Examples: Interpreting the Plots of Residuals The plot of the residual values against the x values can tell us a lot.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
Correlation & Linear Regression Using a TI-Nspire.
1 Objective Given two linearly correlated variables (x and y), find the linear function (equation) that best describes the trend. Section 10.3 Regression.
Lecture Slides Elementary Statistics Twelfth Edition
Statistics 200 Lecture #6 Thursday, September 8, 2016
Lesson 4.5 Topic/ Objective: To use residuals to determine how well lines of fit model data. To use linear regression to find lines of best fit. To distinguish.
Statistics 101 Chapter 3 Section 3.
Section 11.2 Day 5.
Cautions about Correlation and Regression
Chapter 12: Regression Diagnostics
Scatterplots A way of displaying numeric data
Suppose the maximum number of hours of study among students in your sample is 6. If you used the equation to predict the test score of a student who studied.
Regression and Residual Plots
Lecture Slides Elementary Statistics Thirteenth Edition
2. Find the equation of line of regression
CHAPTER 26: Inference for Regression
No notecard for this quiz!!
EQ: How well does the line fit the data?
Section 3.3 Linear Regression
AP Statistics, Section 3.3, Part 1
The Weather Turbulence
Least-Squares Regression
STEM Fair Graphs.
Regression is the Most Used and Most Abused Technique in Statistics
Day 42 – Understanding Correlation Coefficient
Residuals and Residual Plots
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Correlation and Regression
Adequacy of Linear Regression Models
Least-Squares Regression
y = mx + b Linear Regression line of best fit REMEMBER:
Regression Forecasting and Model Building
Indicator Variables Response: Highway MPG
Warmup A study was done comparing the number of registered automatic weapons (in thousands) along with the murder rate (in murders per 100,000) for 8.
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Ch 4.1 & 4.2 Two dimensions concept
Lesson 2.2 Linear Regression.
Regression and Correlation of Data
Ch 9.
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Dr. Fowler  AFM  Unit 8-5 Linear Correlation
Lesson – Teacher Notes Standard:
Solution to Problem 2.25 DS-203 Fall 2007.
Presentation transcript:

1.8 Critical Analysis: Outliers

1.8.1 Driving Test Scores School is making assumption that the correlation between instructional hours and test scores is indication of instructor’s teaching skills Difficult to prove definitively Reasonable if school has found instructors have consistently strong correlations between time spent with their students and students’ test scores

Analysis: Number of hours is independent variable Scatterplot indicates strong positive linear correlation Linear regression analysis yields line of best fit: y = -0.13x + 80 and r = -0.05! virtually zero linear correlation line of best fit has negative slope: less time instructing -> better test scores? Scatterplot looked favourable, but regression analysis suggests that instructor’s lessons had no positive effect on students’ test results

Funny Data? Outlier at (20, 45) Look at residual plot. Substantially below the others suggests special circumstance may have caused abnormal result on test Illness, emotional upset, really bad driver! Could have been a typo! Bad regression model? Reasonable to exclude this data point when evaluating the driving instructor

Removing outlier and repeating analysis y = 1.2299x + 66.525, r = 0.93 Strong positive linear correlation between instructional hours and test result Instructor may be an effective teacher after all

Note: Outlier had dramatic effect on regression analysis because Distant from other data points Sample size quite small To do a proper evaluation, need Larger set of data More information about the outlier

Outliers Can skew regression analysis Could also simply indicate that data really do have large variations Comprehensive analysis of a set of data should Look for outliers Examine their possible causes Examine their effect on analysis Discuss whether they should be excluded from calculations Outliers have less effect on larger samples