Sociology 601: Class 5, September 15, 2009

Slides:



Advertisements
Similar presentations
Estimation of Means and Proportions
Advertisements

Chapter 7 Statistical Inference: Confidence Intervals
POINT ESTIMATION AND INTERVAL ESTIMATION
Sociology 601 Class 8: September 24, : Small-sample inference for a proportion 7.1: Large sample comparisons for two independent sample means.
Chapter 19 Confidence Intervals for Proportions.
Sociology 601 Class 10: October 1, : Small sample comparisons for two independent groups. o Difference between two small sample means o Difference.
Point and Confidence Interval Estimation of a Population Proportion, p
Lecture Inference for a population mean when the stdev is unknown; one more example 12.3 Testing a population variance 12.4 Testing a population.
Sampling Distributions
Estimating a Population Proportion
7-2 Estimating a Population Proportion
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
BCOR 1020 Business Statistics
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Chapter 7 Confidence Intervals and Sample Sizes
Lecture 3: Review Review of Point and Interval Estimators
STA Lecture 161 STA 291 Lecture 16 Normal distributions: ( mean and SD ) use table or web page. The sampling distribution of and are both (approximately)
Chapter 8: Statistical Inference: Confidence Intervals
STA291 Statistical Methods Lecture 16. Lecture 15 Review Assume that a school district has 10,000 6th graders. In this district, the average weight of.
Estimation of Statistical Parameters
Topic 5 Statistical inference: point and interval estimate
Chapter 7 Statistical Inference: Confidence Intervals
Chapter 8 Introduction to Inference Target Goal: I can calculate the confidence interval for a population Estimating with Confidence 8.1a h.w: pg 481:
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
LECTURE 16 TUESDAY, 31 March STA 291 Spring
Estimates and Sample Sizes Lecture – 7.4
Lecture 14 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Estimating a Population Proportion
Review of Chapters 1- 5 We review some important themes from the first 5 chapters 1.Introduction Statistics- Set of methods for collecting/analyzing data.
Sampling Distribution ● Tells what values a sample statistic (such as sample proportion) takes and how often it takes those values in repeated sampling.
PARAMETRIC STATISTICAL INFERENCE
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Estimating a Population Proportion
The Practice of Statistics Third Edition Chapter 10: Estimating with Confidence Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
STA291 Statistical Methods Lecture 18. Last time… Confidence intervals for proportions. Suppose we survey likely voters and ask if they plan to vote for.
BPS - 3rd Ed. Chapter 131 Confidence Intervals: The Basics.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Section 7-1 Review and Preview.
Lecture 7 Dustin Lueker. 2  Point Estimate ◦ A single number that is the best guess for the parameter  Sample mean is usually at good guess for the.
Inferential Statistics Part 1 Chapter 8 P
STA Lecture 171 STA 291 Lecture 17 Chap. 10 Estimation – Estimating the Population Proportion p –We are not predicting the next outcome (which is.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Statistics and Quantitative Analysis U4320 Segment 5: Sampling and inference Prof. Sharyn O’Halloran.
Chapter 10: Confidence Intervals
Copyright © 1998, Triola, Elementary Statistics Addison Wesley Longman 1 Estimates and Sample Sizes Chapter 6 M A R I O F. T R I O L A Copyright © 1998,
Statistics for Decision Making Basic Inference QM Fall 2003 Instructor: John Seydel, Ph.D.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 8. Parameter Estimation Using Confidence Intervals.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
1 Mean Analysis. 2 Introduction l If we use sample mean (the mean of the sample) to approximate the population mean (the mean of the population), errors.
1 Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Example: In a recent poll, 70% of 1501 randomly selected adults said they believed.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
SECTION 7.2 Estimating a Population Proportion. Where Have We Been?  In Chapters 2 and 3 we used “descriptive statistics”.  We summarized data using.
Chapter Seven Point Estimation and Confidence Intervals.
Lecture 13 Dustin Lueker. 2  Inferential statistical methods provide predictions about characteristics of a population, based on information in a sample.
Chapter 7, part D. VII. Sampling Distribution of The sampling distribution of is the probability distribution of all possible values of the sample proportion.
Chapter 6 Inferences Based on a Single Sample: Estimation with Confidence Intervals Slides for Optional Sections Section 7.5 Finite Population Correction.
04/10/
LECTURE 24 TUESDAY, 17 November
STA 291 Spring 2010 Lecture 12 Dustin Lueker.
Sampling Distributions and Estimation
Data Analysis and Statistical Software I ( ) Quarter: Autumn 02/03
Daniela Stan Raicu School of CTI, DePaul University
Estimation Goal: Use sample data to make predictions regarding unknown population parameters Point Estimate - Single value that is best guess of true parameter.
Chapter 13 - Confidence Intervals - The Basics
Daniela Stan Raicu School of CTI, DePaul University
STA 291 Spring 2008 Lecture 13 Dustin Lueker.
Chapter 14 - Confidence Intervals: The Basics
STA 291 Summer 2008 Lecture 12 Dustin Lueker.
STA 291 Spring 2008 Lecture 12 Dustin Lueker.
Presentation transcript:

Sociology 601: Class 5, September 15, 2009 Overview Homeworks Stata & Review standard errors Chapter 5 Point estimation. (A&F 5.1) Confidence intervals… for a population mean (A&F 5.2) for a population proportion (A&F 5.3) Choosing a sufficient sample size (A&F 5.4)

What we have accomplished with sampling distributions Given a population parameter, we know that a sample statistic will produce a better estimate of the population parameter when the sample is larger. (Better means more accurate and normally distributed). We know what we are doing at a qualitative level.

What’s next We will take it to a quantitative level: How good is a given estimate from a given sample? We will go over formal language and equations for using sample statistics to make inferences for population parameters. Once we have equations for predicting a population mean and standard deviation, we will discuss formal language for defining an interval estimate, a guess of a range of potential values for the population parameter, based on the sample.

5.1: Estimation: definitions Point estimate: a single number, calculated from a set of data, that is the best guess for the parameter. Point estimator: the equation used to produce the point estimate. (Common notation: put a “hat” on the parameter.) Interval estimate: a range of numbers around the point estimate within which the parameter is believed to fall. Also called a confidence interval.

The basics of point estimation The typical point estimator of a population mean is a sample mean: The typical point estimator of a population proportion is a sample proportion: Q: is this a point estimator of a mean?

Point estimators for standard deviations. Estimated standard deviation of observations in a population:

Typical point estimators for standard errors. Estimated standard error of samples drawn from a population: Special case: estimated standard error of a population proportion:

Choosing a good estimator You can technically use any equation you want as a point estimator, but the most popular ones have certain desirable properties. Unbiasedness: The sampling distribution for the estimator ‘centers’ around the parameter. (On average, the estimator gives the correct value for the parameter.) Efficiency: If at the same sample size one unbiased estimator has a smaller sampling error than another unbiased estimator, the first one is more efficient. Consistency: The value of the estimator gets closer to the parameter as sample size increases. Consistent estimators may be biased, but the bias must become smaller as the sample size increases if the consistency property holds true.

Examples for point estimates: Given the following sample of seven observations: 5,2,5,2,4,5,5 What is the estimator of the population mean? What is the estimate of the population mean? What is the estimator of the population standard error? What is the estimate of the population standard error for this sample? What is the estimate of the population proportion with a value of 5 or greater? What is the estimate of the population standard error for the proportion with a value 5 or greater?

Examples for point estimates: Given the following sample of seven observations: 5,2,5,2,4,5,5 What is the estimator of the population mean? What is the estimate of the population mean? (5+2+5+2+4+5+5) / 7 = 28 / 7 = 4 What is the estimator of the population standard error? What is the estimate of the population standard error for this sample? =sqrt {[(5-4)2+(2-4)2+(5-4)2+(2-4)2+(4-4)2+(5-4)2+(5-4)2]/(7-1)} / sqrt(7) = sqrt { [1 + 4 + 1 + 4 + 0 + 1 + 1] / 6 } / sqrt(7) = sqrt(2) / sqrt(7) = 1.41 / 2.64 = 0.53

Examples for point estimates: Given the following sample of seven observations: 5,2,5,2,4,5,5 What is the estimate of the population proportion with a value of 5 or greater? = 4 / 7 = .57 What is the estimate of the population standard error for the proportion with a value 5 or greater? = sqrt(.57 * (1-.57)) / sqrt(7) = sqrt (.57 * .43) / sqrt(7) = sqrt (.24) / sqrt(7) = .49 / 2.64 = .187

5.2: interval estimates: Interval estimate (also called a confidence interval): a range of numbers that we think has a given probability of containing a parameter. Confidence coefficient: The probability that the interval estimate contains the parameter. Typical confidence coefficients are .95 and .99. We usually are told the desired confidence coefficient, then asked to find the interval estimate appropriate for the confidence coefficient.

95% confidence interval for a sample mean: Example of confidence interval. 95% confidence interval for a sample mean: example using age from IHDS: . summarize age Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 215754 27.34663 19.34841 0 116 . ci age Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- age | 215754 27.34663 .0416549 27.26499 27.42827 Q: how is std. err. of age calculated? Q: assumptions? ci= Ybar +- invnorm(1-p/2) * s / sqrt(N)

Equations for interval estimates. Confidence interval of a mean and proportion: where… and where you choose z, based on the p-value for the confidence interval you want Assumption: the sample size is large enough that the sampling distribution is approximately normal ci= Ybar +- invnorm(1-p/2) * s / sqrt(N)

Notes on interval estimates: Usually, we are not given z. Instead we start with a desired confidence interval (e.g., 95% confidence), and we select an appropriate z – score. We generally use a 2-tailed distribution in which ½ of the confidence interval is on each side of the sample mean. What does this do to our choice of p-values for the z-scores?

Equations for interval estimates. Example: find c.i. when Ybar =10.2, s=10.1, N=1055, interval=95%. z is derived from the 95% value: what value of z leaves 95% in the middle and 2.5 % on each end of a distribution? For p = .975, z = 1.96 The standard error is s/SQRT(n) = 10.1/SQRT(1055) = .31095 Top of the confidence interval is 10.2 + 1.96*.31095 = 10.8095 The bottom of the interval is 10.2 – 1.96*.31095 = 9.5905 Hence, the confidence interval is 9.59 to 10.81

Normality rules for confidence Confidence intervals assume a normal distribution of possible samples Q: when can you assume normality for a sampling distribution of a continuous interval variable (such as income?) A1: when N >= 30 A2: when observations in the population can be assumed to be normally distributed.

5.3: Confidence intervals for population proportions: Confidence interval for a population proportion: Example, 424 of 1000 respondents in a poll report that they plan to vote for candidate X. Calculate a 95% c.i. for this result. = .424 +- 1.96 * sqrt { [ .424 * (1-.424)] / 1000 } = .424 +- 1.96 * sqrt { [ .424 * .576 ] / 1000 } = .424 +- 1.96 * sqrt { .000244} = .424 +- 1.96 * .0156 = .424 +- 0.031 = .395 -> .455

Normality rules for confidence intervals for sample proportions: Q: when can you assume normality for a sample of a dichotomous interval variable (yes = 1, no = 0) A: when n(p(1-p)) >= 10 (For what values of p do you need an extra large n to ensure a normal sampling distribution?) What can go wrong when you inappropriately assume a normal sampling distribution?

Putting it all together: Given the following sample of seven observations: 5,2,5,2,4,5,5 What is the 95% confidence interval of the population mean?

What is the best phrasing for an interval estimate? a.) The 95% confidence interval for the population mean is 6.8 to 9.5? Or… b.) There is a 95% probability that the true population mean is between 6.8 and 9.5? Or… c.) We estimate that 95% of samples from the underlying population would fall within 1.35 of the true population mean, and we estimate that the true population mean is 8.15?

Confidence intervals using STATA Confidence intervals for means and proportions using cii 95 % confidence interval for General Social Survey sexfreq question as per A&F example 5.1 Command is: cii samplesize mean standarddeviation, level(level) cii 1055 10.2 10.1, level(95) Variable | Obs Mean Std. Err. [95% Conf. Interval] -------------+--------------------------------------------------------------- | 1055 10.2 .3109533 9.589842 10.81016 * Variant with higher threshold for “confidence” cii 1055 10.2 10.1, level(99) Variable | Obs Mean Std. Err. [99% Conf. Interval] | 1055 10.2 .3109533 9.397584 11.00242 * 95% confidence interval for proportion, as per A&F example 5.2 cii 1934 895, level(95) -- Binomial Exact -- | 1934 .4627715 .011338 .4403617 .4852942

5.4: Choosing the best sample size Cost is directly proportional to sample size, so we generally want the minimum sample to do the job. Estimating minimum sample size is commonly done with population proportions With population proportions, you do not need to make separate guesses about the population mean and standard deviation. With population proportions, it is easy to identify a conservative mean, and the bias does not vary much.

Choosing the best sample size for a population proportion We already have an equation for the confidence interval: When we choose the best sample size, we choose one half of the confidence interval (the top one) and solve for n Agresti and Finlay’s term for one half of the confidence interval is the confidence bound B

Sample size example: Example: Sample size for election poll: Desired 95% c.i. = + or – 3% Preliminary estimate: π = .50 What sample size is needed?

Choosing the best sample size for a sample mean Estimating minimum sample size is less commonly done with population means With population means, you need to make separate guesses about the population mean and standard deviation. We generally have a hard time making a good guess about a population standard deviation without measuring it.

Choosing the best sample size for a population mean We already have an equation for the confidence interval: When we choose the best sample size, we choose one half of the confidence interval (the top one) and solve for n Again, Agresti and Finlay’s term for one half of the confidence interval is the confidence bound B

Sample size example: Example: Sample size for study of educational attainment among elderly native Americans: Desired 99% c.i. = + or –1 year Preliminary estimates: μ = 12, σ = 2.5 What sample size is needed?