Introduction to Inference

Introduction to Inference
Sampling Distribution of Means & Central Limit Theorem Dr. Amjad El-Shanti MD, PMH, Dr PH University of Palestine 2016

Course Overview Exploring Data Inference Collecting Data
Probability Intro. Inference Comparing Variables Relationships between Variables Means Proportions Regression Contingency Tables

Inference with a Single Observation
? Population Parameter:  Sampling Inference Observation Xi Each observation Xi in a random sample is a representative of unobserved variables in population How different would this observation be if we took a different random sample?

Normal Distribution Last class, we learned normal distribution as a model for our overall population Can calculate the probability of getting observations greater than or less than any value Usually don’t have a single observation, but instead the mean of a set of observations

Inference with Sample Mean
? Population Parameter:  Sampling Inference Estimation Sample Statistic: x Sample mean is our estimate of population mean How much would the sample mean change if we took a different sample? Key to this question: Sampling Distribution of x

Sampling Distribution of Sample Mean
Distribution of values taken by statistic in all possible samples of size n from the same population Model assumption: our observations xi are sampled from a population with mean  and variance 2 Sample 1 of size n x Sample 2 of size n x Sample 3 of size n x Sample 4 of size n x Sample 5 of size n x Sample 6 of size n x Sample 7 of size n x Sample 8 of size n x . Distribution of these values? Population Unknown Parameter: 

Mean of Sample Mean mean( X ) = μ
First, we examine the center of the sampling distribution of the sample mean. Center of the sampling distribution of the sample mean is the unknown population mean: mean( X ) = μ Over repeated samples, the sample mean will, on average, be equal to the population mean no guarantees for any one sample!

Variance of Sample Mean
Next, we examine the spread of the sampling distribution of the sample mean The variance of the sampling distribution of the sample mean is variance( X ) = 2/n As sample size increases, variance of the sample mean decreases! Averaging over many observations is more accurate than just looking at one or two observations

Comparing the sampling distribution of the sample mean when n = 1 vs
Comparing the sampling distribution of the sample mean when n = 1 vs. n = 10

Law of Large Numbers Remember the Law of Large Numbers:
If one draws independent samples from a population with mean μ, then as the number of observations increases, the sample mean x gets closer and closer to the population mean μ This is easier to see now since we know that mean(x) = μ variance(x) = 2/n as n gets large

Example Population: seasonal home-run totals for 7032 baseball players from 1901 to 1996 Take different samples from this population and compare the sample mean we get each time In real life, we can’t do this because we don’t usually have the entire population! Sample Size Mean Variance 100 samples of size n = 1 3.69 46.8 100 samples of size n = 10 4.43 100 samples of size n = 100 4.42 0.43 100 samples of size n = 1000 0.06 Population Parameter  = 4.42

Distribution of Sample Mean
We now know the center and spread of the sampling distribution for the sample mean. What about the shape of the distribution? If our data x1,x2,…, xn follow a Normal distribution, then the sample mean x will also follow a Normal distribution!

Example Mortality in US cities (deaths/100,000 people)
This variable seems to approximately follow a Normal distribution, so the sample mean will also approximately follow a Normal distribution

Central Limit Theorem What if the original data doesn’t follow a Normal distribution? HR/Season for sample of baseball players If the sample is large enough, it doesn’t matter! 14

Central Limit Theorem If the sample size is large enough, then the sample mean x has an approximately Normal distribution This is true no matter what the shape of the distribution of the original data! 

CENTRAL LIMIT THEOREM specifies a theoretical distribution
formulated by the selection of all possible random samples of a fixed size n a sample mean is calculated for each sample and the distribution of sample means is considered

SAMPLING DISTRIBUTION OF THE MEAN
The mean of the sample means is equal to the mean of the population from which the samples were drawn. The variance of the distribution is s divided by the square root of n. (the standard error.)

STANDARD ERROR Standard Deviation of the Sampling Distribution of Means sx = s/ \/n

How Large is Large? If the sample is normal, then the sampling distribution of will also be normal, no matter what the sample size. When the sample population is approximately symmetric, the distribution becomes approximately normal for relatively small values of n. When the sample population is skewed, the sample size must be at least 25 before the sampling distribution of becomes approximately normal.

Central Limit Theorem CLT states that for randomly selected sample size (n should be at least 25, but the larger n , the better the approximation) with a mean (μ) and standard deviation (s ): The mean of the distribution of sample means is equal to the mean of the population distribution (μ (x) = μ ). The standard deviation of the distribution of sample means is equal to the standard deviation of the population divided by the square root of the sample size : (s (x) = s/√n). For any selected sample for any population with mean μ and standard deviation s then the distribution of sample means is approximately normal regardless whether the population distribution is normal or not.

EXAMPLE A certain brand of tires has a mean life of 25,000 miles with a standard deviation of 1600 miles. What is the probability that the mean life of 64 tires is less than 24,600 miles?

Example continued The sampling distribution of the means has a mean of 25,000 miles (the population mean) m = mi. and a standard deviation (i.e.. standard error) of: 1600/8 = 200

Example continued Convert 24,600 mi. to a z-score and use the normal table to determine the required probability. z = ( )/200 = -2 P(z< -2) = or 2.28% of the sample means will be less than 24,600 mi.

ESTIMATION OF POPULATION VALUES
Point Estimates Interval Estimates

CONFIDENCE INTERVAL ESTIMATES for LARGE SAMPLES
The sample has been randomly selected The population standard deviation is known or the sample size is at least 25.

Confidence Interval Estimate of the Population Mean
- X: sample mean s: sample standard deviation n: sample size

EXAMPLE Estimate, with 95% confidence, the lifetime of nine volt batteries using a randomly selected sample where: -- X = 49 hours s = 4 hours n = 36

EXAMPLE continued Lower Limit: 49 - (1.96)(4/6) 49 - (1.3) = 47.7 hrs Upper Limit: 49 + (1.96)(4/6) 49 + (1.3) = 50.3 hrs We are 95% confident that the mean lifetime of the population of batteries is between 47.7 and 50.3 hours.

Introduction to Inference

Similar presentations

Presentation on theme: "Introduction to Inference"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Inference

Similar presentations

Presentation on theme: "Introduction to Inference"— Presentation transcript:

Similar presentations

About project

Feedback