Download presentation
Presentation is loading. Please wait.
1
HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9
2
HS 1679: Inference About a Proportion2 Copyright ©1997 BMJ Publishing Group Ltd. Greenhalgh, T. BMJ 1997;315:364-366 Our data analysis journey continues …
3
HS 1679: Inference About a Proportion3 Types of response variables QuantitativeSumsAverages CategoricalCountsProportions Respons e type Prior chapters have focused on quantitative response variables. We now focus on categorical response variables.
4
HS 1679: Inference About a Proportion4 Binary variables We focus on the most popular type of categorical response, the binary response (categorical variables with two categories; dichotomous variables) Examples of binary responses CURRENT_SMOKER: yes/no SEX: male/female SURVIVED: yes/no DISEASE_STATUS: case/non-case One category is arbitrarily labeled a “success” Count the number of success in the sample Turn the count into a proportion
5
HS 1679: Inference About a Proportion5 Proportions The proportion in the sample is denoted "p-hat" The proportion in the population (parameter) is denoted p
6
HS 1679: Inference About a Proportion6 A proportion is an average of 0s and 1s Example (right): n =10 X binary attribute coded 1=YES and 0=NO x = 2 ObservationX 10 21 30 40 50 60 70 80 90 101 x = 2
7
HS 1679: Inference About a Proportion7 Incidence proportion and prevalence proportion Incidence proportion (risk): proportion that develop condition over specified time Prevalence: proportion with the condition at a point in time Source www.bioteach.ubc.ca/Biomedicine/Smallpox/
8
HS 1679: Inference About a Proportion8 Example: “Smoking prevalence” SRS, n = 57, determine number of current smokers (“successes”) X = 17 Calculations: at least 4 significant digits Reporting ( APA 2001 ): convert to percent and report xx.x%
9
HS 1679: Inference About a Proportion9 Inference about the proportion How good is sample proportion (p-hat) as an estimate population proportoin (p)? To answer this question, consider what would happen if we took many samples of size n from the same population? This creates the sampling distribution of p-hat
10
HS 1679: Inference About a Proportion10 Binomial Sampling Distribution The sampling distribution of the is binomial Binomial probabilities are difficult to calculate However, the binomial becomes Normal when n is large (central limit theorem) The figure to the right shows the number of smokers expected in a sample of n = 57 from a population in which p = 0.25. This distribution is both binomial and Normal. We can use a Normal approximation to the binomial when n is large.
11
HS 1679: Inference About a Proportion11 Sampling Distribution of p-hat when n large When n is large,
12
HS 1679: Inference About a Proportion12 Confidence interval for p (plus 4 method) Take a SRS, count the successes and failures, add two imaginary successes and two imaginary failures to the statistics, put a tilde over these revised statistics: Then calculate the CI according to this formula: Example: 17 smokers in n = 57
13
HS 1679: Inference About a Proportion13 Sample size requirements To estimate p with margin of error d use: where z is the Z value for given level of confidence and p* is an educated guess for the proportion you want to estimate (use p* = 0.5 to get the “safest” estimate) Redo study but now want margin of error of ±.03 Sample size calculations always rounded up. Example: Redo smoking survey; now want 95% CI with margin of error ±.05; assume p * = 0.30 (best available estimate)
14
HS 1679: Inference About a Proportion14 Hypothesis Test A. H 0 : p = p 0 vs. H 1 : p p 0 where p 0 represents the proportion specified by null hypothesis B. Test statistic C. P-value (from z table) D. Significance level Illustration: Prevalence of smoking in the U.S. (p 0 ) is 0.25. Take a SRS of n = 57 from community and find 17 smokers. Therefore, p-hat = 17 / 57 = 0.2982. Is this significantly different than 0.25? A. H 0 :p = 0.25 vs. H 1 :p ≠ 0.25 B. Test statistic C. P = 0.4010 D. Evidence against H 0 is not significant (retain H0)
15
HS 1679: Inference About a Proportion15 Conditions for Inference Valid information SRS To use Normal based method For plus-four confidence interval, n must be 10 or greater For z test, np 0 q 0 5 Illustration: n = 57, p 0 = 0.25, q 0 = 0.75 Therefore, np 0 q 0 = 57 ∙ 0.25 ∙ 0.75 = 10.7 → “OK”
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.