Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.

Slides:



Advertisements
Similar presentations
A Sampling Distribution
Advertisements

Chapter 8: Estimating with Confidence
Sampling Distributions (§ )
McGraw-Hill Ryerson Copyright © 2011 McGraw-Hill Ryerson Limited. Adapted by Peter Au, George Brown College.
Central Limit Theorem.
Objectives Look at Central Limit Theorem Sampling distribution of the mean.
The standard error of the sample mean and confidence intervals
The standard error of the sample mean and confidence intervals
Estimation from Samples Find a likely range of values for a population parameter (e.g. average, %) Find a likely range of values for a population parameter.
Introduction to Inference Estimating with Confidence Chapter 6.1.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
BHS Methods in Behavioral Sciences I
The standard error of the sample mean and confidence intervals How far is the average sample mean from the population mean? In what interval around mu.
Sampling Distributions
Inferential Statistics
Linear Regression 2 Sociology 5811 Lecture 21 Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Standard Error of the Mean
Copyright © 2005 by Evan Schofer
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 10 Sampling Distributions.
Objectives (BPS chapter 14)
Chapter 7 Sampling Distributions
CHAPTER 11: Sampling Distributions
Chapter 5 Sampling Distributions
Introduction to Data Analysis Probability Distributions.
 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.
Introductory Statistics for Laboratorians dealing with High Throughput Data sets Centers for Disease Control.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
A Sampling Distribution
Dan Piett STAT West Virginia University
Estimation of Statistical Parameters
Chapter 8: Confidence Intervals
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Estimation: Sampling Distribution
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Sociology 5811: Lecture 10: Hypothesis Tests Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
PARAMETRIC STATISTICAL INFERENCE
Chapter 11 – 1 Lecture 7 How certain are we? Sampling and the normal distribution.
Chapter 18: Sampling Distribution Models
Rule of sample proportions IF:1.There is a population proportion of interest 2.We have a random sample from the population 3.The sample is large enough.
Chapter 7: Sample Variability Empirical Distribution of Sample Means.
Confidence intervals: The basics BPS chapter 14 © 2006 W.H. Freeman and Company.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Sociology 5811: Lecture 11: T-Tests for Difference in Means Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
Sociology 5811: Lecture 3: Measures of Central Tendency and Dispersion Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 6: Samples, Populations Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
Sociology 5811: Lecture 6: Probability, Probability Distributions, Normal Distributions Copyright © 2005 by Evan Schofer Do not copy or distribute without.
Chapter 10: Confidence Intervals
Estimating a Population Mean:  Known
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Intro to Inference & The Central Limit Theorem. Learning Objectives By the end of this lecture, you should be able to: – Describe what is meant by the.
INFERENTIAL STATISTICS DOING STATS WITH CONFIDENCE.
From the population to the sample The sampling distribution FETP India.
Warsaw Summer School 2015, OSU Study Abroad Program Normal Distribution.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Sampling Theory Determining the distribution of Sample statistics.
Copyright © 2010 Pearson Education, Inc. Slide
Statistics for Business and Economics Module 1:Probability Theory and Statistical Inference Spring 2010 Lecture 4: Estimating parameters with confidence.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Sampling and Sampling Distributions. Sampling Distribution Basics Sample statistics (the mean and standard deviation are examples) vary from sample to.
CHAPTER 6: SAMPLING, SAMPLING DISTRIBUTIONS, AND ESTIMATION Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for a Diverse Society.
Distribution of the Sample Means
Sampling Distributions
Sampling Distributions (§ )
Chapter 5: Sampling Distributions
Presentation transcript:

Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission

Announcements Problem Set #2 due today!

Review: Populations Population: The entire set of persons, objects, or events that have at least one common characteristic of interest to a researcher (Knoke, p. 15) Beyond literal definition, a population is the general group that we wish to study and gain insight into Sample: A subset of a population Random Sample: A sample chosen from a population such that each observation has an equal chance of being selected (Knoke, p. 77) Randomness is one strategy to avoid biased samples.

Review: Statistical Inference Statistical inference: making statistical generalizations about a population from evidence contained in a sample (Knoke, 77) When is statistical inference likely to work? 1. When a sample is large If a sample approaches the size of the population, it is likely be a good reflection of that population 2. When a sample is representative of the entire population As opposed to a sample that is atypical in some way, and thus not reflective of the larger group.

Populations and Samples Population parameters (μ, σ) are constants There is one true value, but it is usually unknown Sample statistics (Y-bar, s) are variables Up until now we’ve treated them as constants But, there are many possible samples The value of mean, S.D. vary depending on which sample you have Like any variable, the mean and S.D. have a distribution Called the “sampling distribution” Made up of all values for any given population

Populations and Samples: Overview PopulationSample Characteristics“parameters”“statistics” Characteristics are: constant (one for population) variables (varies for each sample) Notation Greek ( ,  ) Roman (, s) Estimate“hat”:“point estimate” based on sample

Population and Sample Distributions   s

Estimating the Mean Suppose we want to know the mean of a population (μ). What do we do? Plan A: Spend $100 million dollars to survey our entire population If it is even possible to survey the whole population Plan B: Spend $1,000 sampling a few hundred people. Estimate the mean Simply use formulas to estimate mu:

Estimating the Mean Question: Given our sample, what is our best guess of the population mean? Answer: The sample mean: Y-bar Look at Y-bar, assume that it is a “good guess” Thus, we calculate:

Estimating the Mean Issue: There are an infinite number of possible samples that one can take from any population –Each possible sample has a mean, most of which are different Some are close to the population mean, some not Q: How do we know if we got a “good guess”? A: We can’t know for sure. We may draw incorrect conclusions about the mean But: We can use probability theory to determine if our guess is likely to be good!

Estimates and Sampling Distributions It is possible to take more than one sample And calculate more than one estimate of the mean If we took many samples (and calculated many means), we’d see a range of estimates We could even plot a histogram of the many estimates Our confidence in our guess depends on how “spread out” the range of guesses tends to be The “standard deviation” of that particular histogram.

Sampling Distributions Sampling Distribution: The distribution of estimates created by taking all possible unique samples (of a fixed size) from a population Example: Take every possible 10-person sample of sociology graduate students (all combinations) 1. Calculate the mean of each sample 2. Graph a histogram of all estimates This is called “the sampling distribution of the mean” Note: The sampling distribution is rarely known It is typically thought of as a probability distribution.

Sampling Distribution Notation Population mean and S.D. are:  Each sample has a mean and S.D.: Y-bar, s The sampling distribution of the mean (i.e., the distribution of mean-estimates) also has a mean And a S.D., aka the “standard error” Mean, S.D. of sampling distribution: Question: Why are they Greek? A:Because all possible samples represent a population Question: Why is there a sub-Y-bar? Because it is the mean of all possible Y-bars (means)

Sampling Distribution of the Mean It turns out that under some circumstances, the shape of the sampling distribution of the mean can be determined –Thus allowing one to get a sense of the range of estimates of the mean one is likely to see If distribution is narrow, our guess is probably good! If S.D. is large, our guess may be quite bad This provides insight into the probable location of the population mean Even if you only have one single sample to look at This “trick” lets us draw conclusions!!!

Sampling Distribution Example Let’s create a sampling distribution from a small population,  = 52. (Sample N = 3) Case# of CDs Note how the mean varies depending on the sample Mean of cases 1,2,3 = 50 Mean of 2,4,5 = 70 For this population (N=5) we can calculate all possible means based on sample size 3

Sampling Distribution Example First, we must calculate every possible mean Case# of CDs ,2,3 = 50 1,2,4 = ,2,5 = ,3,4 = 40 1,3,5 = 30 1,4,5 = ,3,4 = ,3,5 = ,4,5 = 70 3,4,5 = 43.33

Sampling Distribution Example Here, you can see how the sample mean is really a variable This complete list of all possible means is the sampling distribution As a probability distribution, this tells us the probability of picking a sample with each mean Note: Sampling Dist mean = 52 Same as population mean! SampleY-bar

Sampling Distribution Example Histogram of Sampling Distribution (N=3):  = 52 Note: The distribution centers around the population mean And, it is roughly symmetrical

Sampling Distribution Example As a probability distribution, the sampling distribution gives a sense of the quality of our estimate of   = 52 Probability = Frequency / N The probability of picking a sample with a mean that is within +/- 5 of  is p =.3 (30%) The probability of overestimating  by more than 15 is about p =.1 (10%) Q: What is the probability of a “poor” estimate of  ?

Sampling Distribution Example Note: If the sampling distribution is narrow, most of our estimates of the mean will be good That is, they will be close to , the population mean If the sampling distribution is wide, the probability of a “bad” estimate goes up A measure of dispersion can help us assess the sampling distribution Recall: the standard deviation of a sampling distribution is called: the standard error It tells us the width of the sampling distribution!

The Central Limit Theorem But, how do we know the width of the sampling distribution? Statisticians have shown that the sampling distribution will have consistent properties, if we have a large sample Several of these properties constitute the “Central Limit Theorem” These properties provide the basis for drawing statistical inferences about the mean.

The Central Limit Theorem If you have a large sample (Large N): 1. The sampling distribution of the mean (and thus all possible estimates of the mean) cluster around the true population mean 2. They cluster as a normal curve Even if the population distribution is not normal 3. The estimates are dispersed around the population mean by a knowable standard deviation (sigma over root N)

The Central Limit Theorem Formally stated: 1. As N grows large, the sampling distribution of the mean approaches normality

Central Limit Theorem: Visually   s

Implications of the C.L.T What does this mean for us? Typically, we only have one sample, and thus only one estimate of  The actual value of  is unknown So we don’t know the center of the sampling distribution All we know for certain is that our estimate falls somewhere in the sampling distribution This is always true by definition And, later, we’ll estimate its width.

Implications of the C.L.T Visually: Suppose we observe mu-hat = 16 But, mu-hat always falls within the sampling distribution Sampling distribution There are many possible locations of 

Implications of the C.L.T We know that the mean from our sample falls somewhere in this sampling distribution Which has mean , standard deviation  over square root N If we can estimate , we can estimate sigma over root N... The “Standard Error” of the mean We don’t know exactly where the sample falls But, laws of probability suggest that we are most likely to draw a sample w/mean from near the center Recall: 67% fall +/- 1 SD, 95 +/- 2SD in a normal curve So, we can determine the range around  in which 95% (or 99%, or 99.9%) of cases will fall.

Implications of the C.L.T What is the relation between the Standard Error and the size of our sample (N)? Answer: It is an inverse relationship. The standard deviation of the sampling distribution shrinks as N gets larger Formula: Conclusion: Estimates of the mean based on larger samples tend to cluster closer around the true population mean.

Implications of the CLT Visually: The width of the sampling distribution is an inverse function of N (sample size) –The distribution of mean estimates based on N = 10 will be more dispersed. Mean estimates based on N = 50 will cluster closer to . Smaller sample sizeLarger sample size

Confidence Intervals Benefits of knowing the width of the sampling distribution: 1. You can figure out the general range of error that a given point estimate might miss by based on the range around the true mean that the estimates will fall 2. And, this defines the range around an estimate that is likely to hold the population mean A “confidence interval” Note: These only work if N is large!

Confidence Interval Confidence Interval: “A range of values around a point estimate that makes it possible to state the probability that an interval contains the population parameter between its lower and upper bounds.” (Bohrnstedt & Knoke p. 90) It involves a range and a probability Examples: We are 95% confident that the mean number of CDs owned by grad students is between 20 and 45 We are 50% confident the mean rainfall this year will be between 12 and 22 inches.

Confidence Interval Visually: It is probable that  falls near mu-hat Probable values of  Range where  is unlikely to be Q: Can  be this far from mu-hat? Answer: Yes, but it is very improbable

Confidence Interval To figure out the range in of “error” in our mean estimate, we need to know the width of the sampling distribution –The Standard Error! (The S.D. of this distribution) The Central Limit Theorem provides a formula: Problem: We do not know the exact value of sigma-sub-Y, the population standard deviation!

Confidence Interval Question: How do we calculate the standard error if we don’t know the population S.D.? Answer: We estimate it using the information we have Formula for best estimate: Where N is the sample size and s-sub-Y is the sample standard deviation

95% Confidence Interval Example Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200 How do we find the 95% Confidence Interval? If N is large, we know that: 1. The sampling distribution is roughly normal 2. Therefore 95% of samples will yield a mean estimate within 2 standard deviations (of the sampling distribution) of the population mean (  ) Thus, 95% of the time, our estimates of  (Y-bar) are within two “standard errors” of the actual value of .

95% Confidence Interval Formula for 95% confidence interval: Where Y-bar is the mean estimate and sigma (Y- bar) is the standard error Result: Two values – an upper and lower bound Adding our estimate of the standard error:

95% Confidence Interval Suppose a sample of 100 students with mean SAT score of 1020, standard deviation of 200 Calculate: Thus, we are 95% confident that the population mean falls between 980 and 1060.