Download presentation
Presentation is loading. Please wait.
Published byEdwina Horn Modified over 9 years ago
1
Statistics for Social and Behavioral Sciences Session #15: Interval Estimation, Confidence Interval (Agresti and Finlay, Chapter 5) Prof. Amine Ouazad
2
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks 10-14 This is where we talk about Zmapp and Ebola! Firenze or Lebanese Express’s ratings are within a MoE of each other!
3
Last Session: Inference A conservative Margin of Error (= 2 standard errors) for Cafe Firenze’s restaurant rating is 1.1 with 14 votes. For any rating from 1 to 5, the largest possible Margin of Error is 4/√N, where N is the number of ratings. With TripAdvisor, we see the rating of each individual customer, and so we can calculate s X ! Central Limit Theorem: with a large sample size N, the sampling distribution of the sample mean is approximately normal. The mean of the sampling distribution is the population mean. The standard deviation of the sampling distribution is X /√N, where X is the standard deviation of X. Central Limit Theorem: with a large sample size N, the sampling distribution of the sample mean is approximately normal. The mean of the sampling distribution is the population mean. The standard deviation of the sampling distribution is X /√N, where X is the standard deviation of X.
4
Today Use this margin of error to provide interval estimates: – A 95% confidence interval for Café Firenze is [2.3,4.5]. – “The true rating of Café Firenze is between 2.3 and 4.5 with probability 95%”. – Note: average was 3.4 and MoE was 1.1. – A 95% confidence interval for Cory Gardner’s vote share in Colorado is [48-3.6,48+3.6]=[44.4,51.6]. – “The true vote share for Cory Gardner is between 42.9% of the vote and 50.1% of the vote with 95% probability”. – Note: MoE was 3.6.
6
News: Last Tuesday We learnt the population proportion !!! – Proportion of voters for Cory Gardner. The latest poll was giving us a sample proportion of the vote p (N around 1000).
7
Outline 1.Interval Estimation Confidence Interval 2.Choosing between 90, 95, 99% confidence 3.When distributions are normal: t-distribution Next time:Estimation, Confidence Intervals (continued) Chapter 5 of A&F
8
Parameters and Interval Estimate An interval estimate is an interval of numbers around the point estimate, which includes the parameter with probability either 90%, 95%, or 99%. Example: “the interval estimate [156.2 cm – 0.49cm ; 156.2 cm + 0.49cm] includes the population average height with probability 95%.” Sample mean: 156.2cm, MoE = 0.49 cm.
9
Parameters and Interval Estimate An interval estimate that includes the parameter with probability 95% is called a 95% confidence interval. The expression “95% confidence interval” is widely used. Example: “[156.2 cm – 0.49cm ; 156.2 cm + 0.49cm] is a 95% confidence interval for the population average height.” Sample mean: 156.2cm, MoE = 0.49 cm.
10
How do we build a 95% confidence interval? Goal: estimate the population average . From previous sessions: [ – MoE ; + MoE] includes the sample mean with probability 95%. We conclude: the interval [m – MoE; m+MoE] includes the population mean with probability 95%. [m – MoE; m+MoE] is a 95% confidence interval for . MoE = 1.96 x Standard Error Standard Error = sX/√N We use 1.96 instead of 2 from now on.
13
Outline 1.Interval Estimation Confidence Interval 2.Choosing between 90, 95, 99% confidence 3.When distributions are normal: t-distribution Next time:Estimation, Confidence Intervals (continued) Chapter 5 of A&F
14
Choosing between 90%, 95%, 99% The interval estimate [Sample Mean – MoE, Sample Mean + MoE] includes the population mean (the parameter) with probability: 99% if MoE = 2.58 * Standard Error 95% if MoE = 1.96 * Standard Error 90% if MoE = 1.65 * Standard Error The width of a confidence interval: 1.Increases as the confidence level increases. 2.Decreases as the sample size increases.
15
Building 90%, 95%, 99% confidence intervals Exercise: The sample mean weight (a sample of individuals in the US) is 60.0 kg, and the sample standard deviation is 29.9 kg. Find a 90% (resp., 95%, 99%) confidence interval for the population mean weight.
16
Why 90%, 95%, 99%? Invented by Jerzy Newman in the 1930s. R.A. Fisher developed the theory of statistical testing. Sample sizes were small at the time (a few hundred), and 95% seemed a reasonable confidence level. Medical sciences introduced confidence intervals in medicine soon after their discoveries. 95% became the standard. R.A. Fisher
17
Outline 1.Interval Estimation Confidence Interval 2.Choosing between 90, 95, 99% confidence 3.When distributions are normal: t-distribution Next time:Estimation, Confidence Intervals (continued) Chapter 5 of A&F
18
Central Limit Theorem Requires a large sample size N. This is because it applies to any distribution of X. Example #1: – We had a sample of N songs, and the number of times X i that song had been played. – The number of times X i a song is played on Spotify does not have a normal distribution. – But we can build a confidence interval for the average number of times a song is played ( ), provided we have a large enough number N of songs. – MoE = 1.96 * X /√N for a 95% confidence interval.
19
We can use our formulas to find a 95% confidence interval for m=360.63 as: N is large. Even though X does not have a normal distribution.
20
What if N is small? If N is “small”, the Central Limit Theorem does not apply…. – We cannot use our formulas. “Small” ? Less than a few hundred (from experience). If N is very small: These sampling distributions are not normal. N=2 N=5
21
If N is small s X is potentially very far from x. But… we can still find confidence intervals if X is normal. The sampling distribution of the sample mean is Student’s t distribution, with degrees of freedom (df) equal to N-1, and with standard deviation s x /√N.
22
If N is small A 95% confidence interval for the sample mean is: [Sample Mean – MoE, Sample Mean + MoE] With MoE = z * Standard Error. z= 1.96 when the df = ∞ z> 1.96 when the df are small. See next table for the exact value of z.
23
t Table
24
Why is it called Student’s t distribution? The t distribution was allegedly invented by a person called Student. That “Student” was an engineer at Guinness’s Factories in Ireland: William Sealy Gossett. He was producing small samples of a drink, seeking guidance for industrial quality control: – He was trying a small number of samples (N=2,4, perhaps 7). – And from these samples was trying to infer the quality of all containers of the product (the population). W.S. Gosset and Some Neglected Concepts in Experimental Statistics: Guinnessometrics II, Stephen T. Ziliak, 2011.
25
Wrap up Interval estimates for a population mean (a parameter) when N is large, for any distribution of X. Build a confidence interval for a parameter: the interval [Sample Mean – MoE ; Sample + MoE] includes the parameter with probability: 99% if MoE = 2.58 * Standard Error 95% if MoE = 1.96 * Standard Error 90% if MoE = 1.65 * Standard Error The t-distribution gives confidence intervals when the sample size N is small… and when the distribution of X is normal. Use z given by Table 5.1 of Agresti and Finlay for degrees of freedom N-1.
26
Coming up: Readings: This week and next week: – Chapter 5 entirely – estimation, confidence intervals. Online quiz deadline Tuesday 9am. Deadlines are sharp and attendance is followed. For help: Amine Ouazad Office 1135, Social Science building amine.ouazad@nyu.edu Office hour: Tuesday from 5 to 6.30pm. GAF: Irene Paneda Irene.paneda@nyu.edu Sunday recitations. At the Academic Resource Center, Monday from 2 to 4pm.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.