Stats of Engineers, Lecture 8
1.If the sample mean was larger 2.If you increased your confidence level 3.If you increased your sample size 4.If the population standard deviation was larger
Recap: Confidence Intervals for the mean Normal data, variance known or large data sample – use normal tables Normal data, variance unknown – use t-distribution tables Q
Normal t-distribution
Linear regression
Sample means Equation of the fitted line is
Quantifying the goodness of the fit Residual sum of squares
Predictions Confidence interval for mean y at given x What is the error bar?
y x Example: The data y has been observed for various values of x, as follows: Fit the simple linear regression model using least squares.
Recall fit was
Extrapolation: predictions outside the range of the original data
Looks OK!
Extrapolation: predictions outside the range of the original data Quite wrong! Extrapolation is often unreliable unless you are sure straight line is a good model
What about the distribution of future data points themselves? Confidence interval for a prediction Two effects: - Variance of individual points about the mean
Confidence interval for mean y at given x -Extrapolation often unreliable – e.g. linear model may well not hold at below-freezing temperatures. Confidence interval unreliable at T=-20. Answer
Correlation Regression tries to model the linear relation between mean y and x. Correlation measures the strength of the linear association between y and x. Weak correlationStrong correlation - same linear regression fit (with different confidence intervals)
If x and y are negatively correlated:
More convenient if the result is independent of units (dimensionless number). r = 1: there is a line with positive slope going through all the points; r = -1: there is a line with negative slope going through all the points; r = 0: there is no linear association between y and x. Pearson product-moment. Define
Notes: - magnitude of r measures how noisy the data is, but not the slope
Correlation A researcher found that r = between the high temperature of the day and the number of ice cream cones sold in Brighton. What does this information tell us? 1.Higher temperatures cause people to buy more ice cream. 2.Buying ice cream causes the temperature to go up. 3.Some extraneous variable causes both high temperatures and high ice cream sales 4.Temperature and ice cream sales have a strong positive linear relationship. Question from Murphy et al.
Correlation r error - not easy; possibilities include subdividing the points and assessing the spread in r values. Error on the estimated correlation coefficient? J Polit Econ. 2008; 116(3): 499–532.
Strong evidence for a 2-3% correlation. - this doesn’t mean being tall causes you earn more (though it could)