1 Session 7 Standard errors, Estimation and Confidence Intervals.

Slides:



Advertisements
Similar presentations
Números.
Advertisements

AP STUDY SESSION 2.
EuroCondens SGB E.
& dding ubtracting ractions.
Introductory Mathematics & Statistics for Business
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
BUS 220: ELEMENTARY STATISTICS
Statistical Significance and Population Controls Presented to the New Jersey SDC Annual Network Meeting June 6, 2007 Tony Tersine, U.S. Census Bureau.
CALENDAR.
Copyright © 2010 Pearson Education, Inc. Slide
Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS
Lecture 2 ANALYSIS OF VARIANCE: AN INTRODUCTION
1 Contact details Colin Gray Room S16 (occasionally) address: Telephone: (27) 2233 Dont hesitate to get in touch.
Multiple-choice example
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
SADC Course in Statistics Estimating population characteristics with simple random sampling (Session 06)
The Poisson distribution
SADC Course in Statistics Further ideas concerning confidence intervals (Session 06)
SADC Course in Statistics Meaning and use of confidence intervals (Session 05)
SADC Course in Statistics Confidence intervals using CAST (Session 07)
SADC Course in Statistics Comparing two proportions (Session 14)
SADC Course in Statistics Introduction to Statistical Inference (Session 03)
Probability Distributions
Chapter 7 Sampling and Sampling Distributions
The 5S numbers game..
Sampling Distributions
Biostatistics Unit 5 Samples Needs to be completed. 12/24/13.
1 BA 275 Quantitative Business Methods Statistical Inference: Hypothesis Testing Type I and II Errors Power of a Test Hypothesis Testing Using Statgraphics.
Sampling in Marketing Research
Break Time Remaining 10:00.
The basics for simulations
You will need Your text Your calculator
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
PP Test Review Sections 6-1 to 6-6
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
5-1 Chapter 5 Theory & Problems of Probability & Statistics Murray R. Spiegel Sampling Theory.
Chapter 7, Sample Distribution
Hypothesis Tests: Two Independent Samples
Chapter 10 Estimating Means and Proportions
Chapter 4 Inference About Process Quality
The Sample Mean. A Formula: page 6
Statistics Review – Part I
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Hours Listening To Music In A Week! David Burgueño, Nestor Garcia, Rodrigo Martinez.
Module 16: One-sample t-tests and Confidence Intervals
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Putting Statistics to Work
Statistical Inferences Based on Two Samples
© The McGraw-Hill Companies, Inc., Chapter 10 Testing the Difference between Means and Variances.
Converting a Fraction to %
Chapter 8 Estimation Understandable Statistics Ninth Edition
Clock will move after 1 minute
The Right Questions about Statistics: How confidence intervals work Maths Learning Centre The University of Adelaide A confidence interval is designed.
& dding ubtracting ractions.
Copyright © 2013 Pearson Education, Inc. All rights reserved Chapter 11 Simple Linear Regression.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Simple Linear Regression Analysis
CHAPTER 14: Confidence Intervals: The Basics
January Structure of the book Section 1 (Ch 1 – 10) Basic concepts and techniques Section 2 (Ch 11 – 15): Inference for quantitative outcomes Section.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Section 7-2 Estimating a Population Proportion Created by Erin.
Commonly Used Distributions
Chapter 8 Estimating with Confidence
Sampling Distributions (§ )
Chapter 7 Estimation: Single Population
Presentation transcript:

1 Session 7 Standard errors, Estimation and Confidence Intervals

2 By the end of this session, you will be able to explain what is meant by an estimate of a population parameter, and its standard error explain the meaning of a confidence interval calculate a confidence interval for the population mean using sample data, and state the assumptions underlying the above calculation Learning Objectives

3 Inference is about drawing conclusions concerning population characteristics using information gathered from the sample It is assumed that the sample is representative of the population A further assumption is that the sample has been drawn as a simple random sample from an infinite population Reminder: What is inference?

4 PopulationSample Mean Variance 2 s2s2 Std. deviation s Population characteristics (parameters) are denoted by greek letters, sample values by latin letters Sample characteristics are measurable and form estimates of the population values. Estimation

5 What is the mean number of persons per household in Mukono district? Data from 80 households surveyed in this district gave a mean household size of 5.6 with a standard deviation Hence our best estimate of the mean household size in Mukono district is therefore 5.6. What results are likely if we sampled again with a different set of households? Example of statistical inference

6 Example using Stata Open Stata file UNHS_hh&poverty.dta Numeric code is 109 for Mukono

7 Use summarize dialogue Type db summarize or use menu Statistics Summaries, tables Summaries Summary Statistics Variable hhsize Then use by/if/in tab dist ==109 is condition

8 Results Summaries for Mukono only Summaries for whole sample

9 Suppose 10 University students were given a standard meal and the time taken to consume the meal was recorded for each. Suppose the 10 values gave: mean = 11.24, with std.dev.= Lets assume this exercise was repeated 50 times with different samples of students A histogram of the resulting 500 obs. appears below, followed by a histogram of the 50 means from each sample The distribution of means

10 The data appear to follow a normal distribution Histogram of raw data

11 The dist n of the sample means is called its Sampling Distribution Notice that the variability of the above dist n is smaller than the variability of the raw data Histogram of 50 sample means

12 The estimate of the mean household size in Mukono district was 5.6. Is this sufficient for reporting purposes, given that this answer is based on one particular sample? What we have is an estimate based on a sample of size 80. But how good is this estimate? We need a measure of the precision, i.e. variability, of this estimate… Back to estimation…

13 The accuracy of the sample mean as an estimate of depends on: (i)the sample size ( n ) since the more data we collect, the more we know about the population, and the (ii) inherent variability in the data 2 These two quantities must enter the measure of precision of any estimate of a population parameter. We aim for high precision, i.e. low standard error! Sampling Variability

14 Precision of as estimate of is given by: the standard error of the mean. Also written as s.e.m., or sometimes s.e. It is estimated using the sample data: s/ n For example on household size, s.e.=3.298/ 80 = 3.298/8.944 = Standard error of the mean

15 Instead of using a point estimate, it is usually more informative to summarise using an interval which is likely (i.e. with 95% confidence) to contain. This is called an interval estimate or a Confidence Interval (C.I.) For example, we could report that the mean household size of HHs in Mukono district is 5.6 with 95% confidence interval (4.87, 6.33), i.e. there is a 95% chance that the interval (4.87,6.33) includes the true value. Confidence Interval for

16 Analysis using Stata Type db ci or use menu Use the by/if/in tab as before

17 Results For whole sample Just for Mukono

18 The 95% confidence limits for (lower and upper) are calculated as: and where t n-1 is the 5% level for the t -distribution with ( n -1) degrees of freedom. Statistical tables and statistical software give t -values. Finding the Confidence Interval

19 P = t-values for finding 95% C.I.

20 If we sampled repeatedly and found a 95% C.I. each time, only 95% of them would include the true, i.e. there is a 95% chance that a single interval would include. Correct interpretation of C.I.s

21 For rural households ( n =40) in Mukono, we find mean=6.43, std.dev.=3.54 for the number of persons per household. Hence a 95% confidence interval for the true mean number of persons per household: 6.43 t 39 ( s/ n ) = (3.54/ 40) = = (5.30, 7.56) Can you interpret this interval? Write down your answer. We will then discuss. An example (persons per HH)

22 Analysis in Stata Press Page Up to retrieve the last command Then add & rurban == 0 to the condition Or use the menus and change the dialogue

23 The above computation of a confidence interval assumes that the data have a normal distribution. More exactly, it requires the sampling distribution of the mean to have a normal distribution. What happens if data are not normal? Not a serious problem if sample size is large because of the Central Limit Theorem, i.e. that the sampling distribution of the mean has a normal distribution, for large sample sizes. Underlying assumptions

24 So even when data are not normal, the formula for a 95% confidence interval will give an interval whose confidence is still high - approximately 95%. It is better to attach some measure of uncertainty than worry about the exact confidence level. Assumptions - continued