Statistics : Statistical Inference Krishna.V.Palem Kenneth and Audrey Kennedy Professor of Computing Department of Computer Science, Rice University 1
Contents Summary of Statistics Learnt so Far Statistical Inference Central Limit Theorem and its implications Estimation theory Interval Estimation What is Confidence Interval? Tutorial 2
Statistical Inference The process of making guesses about the truth from a sample Sample (observation) Make guesses about the whole population Truth (not observable) Population parameters Sample statistics *hat notation ^ is often used to indicate “estitmate” 3 Source: K. Cobb, Stanford
4 Statistical Inference Population (parameters, e.g., and ) select sample at random Sample collect data from individuals in sample Data Analyse data (e.g. estimate ) to make inferences
5 How close is Sample Statistic to Population Parameter ? Population parameters, e.g. and are fixed Sample statistics, e.g. vary from sample to sample How close is to ? Cannot answer question for a particular sample Can answer if we can find out about the distribution that describes the variability in the random variable
Contents Summary of Statistics Learnt so Far Statistical Inference Central Limit Theorem and its implications Estimation theory Interval Estimation What is Confidence Interval? Tutorial 6
The Central Limit Theorem: If all possible random samples, each of size n, are taken from any population with a mean and a standard deviation , the sampling distribution of the sample means (averages) will: 1. have mean: 2. have standard deviation: 3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n). 7
What is it really saying? (1) It gives a relationship between the sample mean and population mean This gives us a framework to extrapolate our sample results to the population (statistical inference); (2) It doesn’t matter what the distribution of the original data is, the sample mean will always be Normally distributed when n is large. This why the Normal is so central to statistics 8
Example: Toss 1, 2 or 10 dice (10,000 times) Toss 1 dice Histogram of data Toss 2 dice Histogram of averages Toss 10 dice Histogram of averages Distribution of data is far from Normal Distribution of averages approach Normal as sample size (no. of dice) increases 9
Central Limit Theorem (3) It describes the distribution of the sample mean The values of obtained from repeatedly taking samples of size n describe a separate population The distribution of any statistic is often called the sampling distribution