Last lecture summary
Population 2015
Population 2014
průměr = 3.3 průměr = 3.0
Data 2015 Population: 4,3,3,5,0,4,4,4,3,4,2,6,8,2,4,3,5,7,3,3 25 samples (n=3) and their averages 3.3,5.3,3.6,4.3,2.3,3.0,3.6,3.0,5.3,5.6,3.3,4.3,3.3,4.0,5.6,4.3,4.3,4.6,6.3,3.3,4.0,3.3,4.6,3.0,4.3
2015, n = 3, number of samples = 25
2015, n = 3, number of samples = 50
2015, n = 3, number of samples = 300
2015, n = 3, all possible samples (1540)
2015, n = 5, all possible samples (42 504)
2015, n = 10, all possible samples ( )
Central limit theorem
ESTIMATION, CONFIDENCE INTERVALS
Statistical inference If we can’t conduct a census, we collect data from the sample of a population. Goal: make conclusions about that population
Demonstration You sample 36 apples from your farm’s harvest of over apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the mean weight of all apples is within 100 and 124 grams?
What is the question?
Slight complication
This is neat! You sample 36 apples from your farm’s harvest of over apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). What is the probability that the population mean weight of all apples is within 100 and 124 grams? We started with very little information (we know just the sample statistics), but we can infere that with the probability of 92.82% a population mean lies within 12 of our sample mean!
Point vs. interval estimate You sample 36 apples from your farm’s harvest of over apples. The mean weight of the sample is 112 grams (with a 40 gram sample standard deviation). Goal: estimate population mean 1. Population mean is estimated as sample mean. i.e. we say population mean equals to 112 g. This is called a point estimate (bodový odhad). 2. However, we can do better. We can estimate that our true population mean will lie with the 95% confidence within an interval of (interval estimate).
Confidence interval This type of result is called a confidence interval (interval spolehlivosti, konfidenční interval). The number of stadandard errors you want to add/subtract depends on the confidence level (e.g. 95%) (hladina spolehlivosti). margin of error možná odchylka critical value kritická hodnota
Confidence level The desired level of confidence is set by the researcher, not determined by data. If you want to be 95% confident with your results, you add/subtract 1.96 standard errors (empirical rule says about 2 standard errors). 95% interval spolehlivosti Confidence levelZ-value
80% 90% 95%99%
Small sample size confidence intervals 7 patient’s blood pressure have been measured after having been given a new drug for 3 months. They had blood pressure increases of 1.5, 2.9, 0.9, 3.9, 3.2, 2.1 and 1.9. Construct a 95% confidence interval for the true expected blood pressure increase for all patients in a population.
William Sealy Gosset aka Student an employee of Guinness brewery 1908 papers addressed the brewer's concern with small samples "The probable error of a mean". Biometrika 6 (1): 1–25. March Probable error of a correlation coefficient". Biometrika 6 (2/3): 302– 310. September 1908.
Student t-distribution Instead of assuming a sampling distribution is normal we will use a Student t-distribution. It gives a better estimate of your confidence interval if you have a small sample size. It looks very similar to a normal distribution, but it has fatter tails to indicate the higher frequency of outliers which come with a small data set.
Student t-distribution
Back to our case