Presentation is loading. Please wait.

Presentation is loading. Please wait.

HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9.

Similar presentations


Presentation on theme: "HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9."— Presentation transcript:

1 HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9

2 HS 1679: Inference About a Proportion2 Copyright ©1997 BMJ Publishing Group Ltd. Greenhalgh, T. BMJ 1997;315:364-366 Our data analysis journey continues …

3 HS 1679: Inference About a Proportion3 Types of response variables QuantitativeSumsAverages CategoricalCountsProportions Respons e type Prior chapters have focused on quantitative response variables. We now focus on categorical response variables.

4 HS 1679: Inference About a Proportion4 Binary variables We focus on the most popular type of categorical response, the binary response (categorical variables with two categories; dichotomous variables) Examples of binary responses CURRENT_SMOKER: yes/no SEX: male/female SURVIVED: yes/no DISEASE_STATUS: case/non-case One category is arbitrarily labeled a “success” Count the number of success in the sample Turn the count into a proportion

5 HS 1679: Inference About a Proportion5 Proportions The proportion in the sample is denoted "p-hat" The proportion in the population (parameter) is denoted p

6 HS 1679: Inference About a Proportion6 A proportion is an average of 0s and 1s Example (right): n =10 X  binary attribute coded 1=YES and 0=NO  x = 2 ObservationX 10 21 30 40 50 60 70 80 90 101  x = 2

7 HS 1679: Inference About a Proportion7 Incidence proportion and prevalence proportion Incidence proportion (risk): proportion that develop condition over specified time Prevalence: proportion with the condition at a point in time Source www.bioteach.ubc.ca/Biomedicine/Smallpox/

8 HS 1679: Inference About a Proportion8 Example: “Smoking prevalence” SRS, n = 57, determine number of current smokers (“successes”)  X = 17 Calculations: at least 4 significant digits Reporting ( APA 2001 ): convert to percent and report xx.x%

9 HS 1679: Inference About a Proportion9 Inference about the proportion How good is sample proportion (p-hat) as an estimate population proportoin (p)? To answer this question, consider what would happen if we took many samples of size n from the same population? This creates the sampling distribution of p-hat

10 HS 1679: Inference About a Proportion10 Binomial Sampling Distribution The sampling distribution of the is binomial Binomial probabilities are difficult to calculate However, the binomial becomes Normal when n is large (central limit theorem) The figure to the right shows the number of smokers expected in a sample of n = 57 from a population in which p = 0.25. This distribution is both binomial and Normal. We can use a Normal approximation to the binomial when n is large.

11 HS 1679: Inference About a Proportion11 Sampling Distribution of p-hat when n large When n is large,

12 HS 1679: Inference About a Proportion12 Confidence interval for p (plus 4 method) Take a SRS, count the successes and failures, add two imaginary successes and two imaginary failures to the statistics, put a tilde over these revised statistics: Then calculate the CI according to this formula: Example: 17 smokers in n = 57

13 HS 1679: Inference About a Proportion13 Sample size requirements To estimate p with margin of error d use: where z is the Z value for given level of confidence and p* is an educated guess for the proportion you want to estimate (use p* = 0.5 to get the “safest” estimate) Redo study but now want margin of error of ±.03 Sample size calculations always rounded up. Example: Redo smoking survey; now want 95% CI with margin of error ±.05; assume p * = 0.30 (best available estimate)

14 HS 1679: Inference About a Proportion14 Hypothesis Test A. H 0 : p = p 0 vs. H 1 : p  p 0 where p 0 represents the proportion specified by null hypothesis B. Test statistic C. P-value (from z table) D. Significance level Illustration: Prevalence of smoking in the U.S. (p 0 ) is 0.25. Take a SRS of n = 57 from community and find 17 smokers. Therefore, p-hat = 17 / 57 = 0.2982. Is this significantly different than 0.25? A. H 0 :p = 0.25 vs. H 1 :p ≠ 0.25 B. Test statistic C. P = 0.4010 D. Evidence against H 0 is not significant (retain H0)

15 HS 1679: Inference About a Proportion15 Conditions for Inference Valid information SRS To use Normal based method For plus-four confidence interval, n must be 10 or greater For z test, np 0 q 0  5 Illustration: n = 57, p 0 = 0.25, q 0 = 0.75 Therefore, np 0 q 0 = 57 ∙ 0.25 ∙ 0.75 = 10.7 → “OK”


Download ppt "HS 1679: Inference About a Proportion1 Inference about a proportion Unit 9."

Similar presentations


Ads by Google