Critical Analysis
b) Do a regression on this data to calculate the line of best fit The following data represents 5 students from a class of 20. Create a scatter plot to model the following data. b) Do a regression on this data to calculate the line of best fit c) Calculate the correlation coefficient d) Does your model seem appropriate? Height (cm) Grade 161 49 170 66 182 73 199 87 189 88
The rest of the class data is as follows The scatter plot for this data would look like this: Height Grade 162 65 170 66 155 82 158 88 160 81 161 49 182 87 189 73 192 52 199 151 153 74 168 56 177 46 178 45 164 41 169 85 175 54 Notice the correlation coefficient is only 0.0104
What went wrong? The sample size was too small A sample size which is too small can lead to predictions using the model to be invalid
The following data shows the average salary in the NHL since 2000. Year Salary 2000 $1,356,380 2001 $1,434,885 2002 $1,642,590 2003 $1,790,209 2004 $1,830,126 2005 1,830,126 2006 $1,460,000 2007 $1,708,607 2008 $1,906,793 2009 $2,126,843 a) Create a scatter plot to model this data b) Do a regression on this data to calculate the line of best fit c) Calculate the correlation coefficient d) Does your model seem appropriate?
Notice the correlation coefficient is only 0.5391
Notice what happens when we spilt the data up and create a scatter plot for the years 2000-2005 and 2005-2010 separately. The correlation coefficients are quite high for both these graphs. Why did this happen? In 2005 there was a league wide lockout and a salary cap was introduced bringing all salaries down. After this year salaries continued to rise again. This is an example of a hidden variable
Questions to ask when performing critical analysis Is the sampling free of bias? Could outliers influence the results? Are there unusual patterns which suggest a hidden variable? Has causality been inferred with only correlation evidence?
Homework/Practice p.209-211 #1-3,5,6,8