Categorical data 1 Single proportion and comparison of 2 proportions دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم.

Slides:



Advertisements
Similar presentations
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
Advertisements

Business Statistics for Managerial Decision
Chapter 10: Hypothesis Testing
Analysis of frequency counts with Chi square
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
T-tests Computing a t-test  the t statistic  the t distribution Measures of Effect Size  Confidence Intervals  Cohen’s d.
Statistics Are Fun! Analysis of Variance
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #16.
Part III: Inference Topic 6 Sampling and Sampling Distributions
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Inferences About Process Quality
5-3 Inference on the Means of Two Populations, Variances Unknown
Review of normal distribution. Exercise Solution.
Overview Definition Hypothesis
Ch 10 Comparing Two Proportions Target Goal: I can determine the significance of a two sample proportion. 10.1b h.w: pg 623: 15, 17, 21, 23.
14. Introduction to inference
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
QNT 531 Advanced Problems in Statistics and Research Methods
Section 10.1 ~ t Distribution for Inferences about a Mean Introduction to Probability and Statistics Ms. Young.
Education 793 Class Notes T-tests 29 October 2003.
POSC 202A: Lecture 9 Lecture: statistical significance.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses.
 The situation in a statistical problem is that there is a population of interest, and a quantity or aspect of that population that is of interest. This.
8.1 Inference for a Single Proportion
Comparing Two Population Means
Evidence Based Medicine
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Chapter 11: Estimation Estimation Defined Confidence Levels
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
Estimation of Statistical Parameters
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
PARAMETRIC STATISTICAL INFERENCE
MS 305 Recitation 11 Output Analysis I
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Copyright © Cengage Learning. All rights reserved. 10 Inferences Involving Two Populations.
Sample-Based Epidemiology Concepts Infant Mortality in the USA (1991) Infant Mortality in the USA (1991) UnmarriedMarriedTotal Deaths16,71218,78435,496.
Confidence intervals are one of the two most common types of statistical inference. Use a confidence interval when your goal is to estimate a population.
7. Comparing Two Groups Goal: Use CI and/or significance test to compare means (quantitative variable) proportions (categorical variable) Group 1 Group.
Measures of central tendency are statistics that express the most typical or average scores in a distribution These measures are: The Mode The Median.
The binomial applied: absolute and relative risks, chi-square.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Confidence intervals and hypothesis testing Petter Mostad
Large sample CI for μ Small sample CI for μ Large sample CI for p
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Introduction to the Practice of Statistics Fifth Edition Chapter 6: Introduction to Inference Copyright © 2005 by W. H. Freeman and Company David S. Moore.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Virtual University of Pakistan Lecture No. 44 of the course on Statistics and Probability by Miss Saleha Naghmi Habibullah.
Review Normal Distributions –Draw a picture. –Convert to standard normal (if necessary) –Use the binomial tables to look up the value. –In the case of.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. PPSS The situation in a statistical problem is that there is a population of interest, and a quantity or.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Understanding Basic Statistics Fourth Edition By Brase and Brase Prepared by: Lynn Smith Gloucester County College Chapter Nine Hypothesis Testing.
Chapter 22 Comparing Two Proportions.  Comparisons between two percentages are much more common than questions about isolated percentages.  We often.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Fundamentals of Data Analysis Lecture 4 Testing of statistical hypotheses pt.1.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests for 1-Proportion Presentation 9.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
Statistics 22 Comparing Two Proportions. Comparisons between two percentages are much more common than questions about isolated percentages. And they.
AP Stat 2007 Free Response. 1. A. Roughly speaking, the standard deviation (s = 2.141) measures a “typical” distance between the individual discoloration.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Comparing Two Proportions Chapter 21. In a two-sample problem, we want to compare two populations or the responses to two treatments based on two independent.
Presentation transcript:

Categorical data 1 Single proportion and comparison of 2 proportions دکتر سید ابراهیم جباری فر( (Dr. jabarifar تاریخ : 1388 / 2010 دانشیار دانشگاه علوم پزشکی اصفهان بخش دندانپزشکی جامعه نگر

The objectives of the session Sampling distribution of simple proportion Calculation of 95% confidence interval for a proportion The comparison of two proportions (or percentages) Statistical test of significance for comparison of two proportions Calculation of 95% Confidence interval for the difference in two proportions.

Categorical data What is categorical data? Examples?

Examples of categorical data Education primary, secondary, university Marital status: married, single,divorced, widowed Cigarette smoking history: never smoker, ex-smoker, current smoker

More examples of categorical data Endpoint in a study Person is dead or alive Person with MI or without MI Person can rate their own health as very good, good, average, bad or very bad

More examples of categorical data Quantitative measurements or assessments can be used as categorical data: Hypertension: Yes (for example systolic BP≥ 160 or diastolic BP ≥ 90 mm Hg) or no Alcohol consumption : none, light(<200 ml of ethanol/ week, heavy ≥ 200 ml of ethanol/week)

Proportions and percentages In this session, we will concentrate on the use of binomial data( = data with just two categories) Example: in a survey interviews were conducted with 5335 middle- aged women. Of these, 1476 were current smokers while 3859 were not. Proportion of smokers= =0.277 Percentage of smokers= 0.277×100=27.7%

Sampling variability of a proportion It is important to take into account the number of subjects included The greater the number of subjects the more reliable our estimates are Example: if we want to estimate proportion of men in a population who smoke cigarettes study of 1000 men will be more trustworthy than study of 10 men

Important assumption We need to know that the sample of individuals studied has been randomly selected from some population of interest

Sampling distribution of single proportion Let ’ s continue with the example of middle aged women. Among 5335 women, there were 1476 smokers If we want to say something about the population which this study sample represents, we need the concept of a sampling distribution.

Let ’ s assume that we repeatedly took a sample of 5335 women and clculated the proportion of smokers For each sample, we calculate the proportion of smokers and then construct a histogram of these values This histogram represents the sampling distribution of the proportion and will take the following shape.

The curve is centred over value of the proportion of smokers in the population, often referred to as the true proportion and represented by µ Some of the sample proportions will be larger than µ, others will be smaller. Many will be close in value to µ a few will be a lot larger or a lot smaller In practice we only conduct one survey, from which we have a sample proportion represented by P. Is P close to µ, or is it very different from µ ?

Only of we are very lucky will P actually be equal to µ. In any random sample, there will be some sampling variation in P. The larger the sample, the smaller the extent of such sampling variation. Consider (P- µ ) 2 as a measure of variation in p from the true proportion µ. Then it can be shown mathematically that if you took lots of random samples each of n subjects then the average value of (P- µ ) 2 is equal to

Variance and standard error of proportion is the vaiance of a proportion is the standard error of a proportion It is a measure of the average extent of error in P= how far we can expect the observed proportion to differ from π on average

Example: π= 0.4: N=100, then SE= N=1000,then SE= (SE smaller) SE does not depend much on π N= 1000 π = 0.5: SE=0.0158

Back to the example:5335 women, 1476 current smokers It means that 27.7% of women are smokers The estimated standard error of the proportion of smokers is We can also use percentages:

95% confidence limits for a proportion We want to get an interval of possible values within which the true population proportion might lie This can be done using the theoretical properties of the Normal distribution It can be shown that P will be within 1.96 standard errors of  with probaility 0.95 That is, there is just a 2.5% risk that the observed proportion will exceed the true population proprtion  by more than 1.96 standard errors, and another 2.5% risk that p will understimate  by more than 1.96 standard errors.

95% Confidence limits for a proportion We use this fact to define a 95% confidence P-1.96× to P × Usually written as P±1.96 ×standard error of P

Back to example The true population percentage of smokers has following 95% confidence interval This means that 95% confidence interval is from 26.5% to 28.9% These two values are the lower and upper confidence limits, respectively.

95% confidence interval 95% confidence intervals= the most common statistical technique for displaying the degree of uncertainty that should be attached to any proportion. There is a 5% risk that the true population proportion lies outside thd interval That is, you can anticipate that one in every 20 confidence intervals you calculate will not include 

Two proportions Example TotalWomenMen 879 (34.2%) 313 (23.8%) 566 (45.1%) YesSmoking No Total

Question From the table, we want to evaluate how strong is the evidence that men smoke more than women

The null hypothesis We need to define null hypothesis In our case, the null hypothesis is that smoking is as freqent among women as is among men (same proportion of smokers among men and women) If the null hypothesis were true, then the whole population would have identical percent (%) of smokers. Alternatively, one can say that if the null hypothesis were true for any randomly selected person (man or women ), the probability of being a smokers is the same independent of sex of the person selected.

Significance testing for comparing 2 proportions After defining the null hypothesis, the main question is If the null hypothesis is true, what are the between the two percentages as that observed? For example, in the Czech study, what is the probability of getting a sex difference in smoking as large as (or larger than) 45% versus 24%?

Observed difference in percentages = P 1 -P 2 = 45.1%-23.8%=21.3% The overall percentage response= =34.2%

If the null hypothesis is true, then the only reason that P 1 -P 2 differs from 0 is due to the sampling variation Under the null hypothesis we are assuming that the two samples of size n 1 =1256 and n 2 =1314 are random samples of people with equal true probabilities of response .

We need to calculated the standard error of the difference in two percentages =1.9%

Now, we compare the observed difference with the standard error of the difference, simply by dividing one by the other. Thus, we compute =11.2 Observed difference in percentages Standard Error of difference Z=

How large does Z have to be in orther for us to assert that we have strong evidence that the null hypothesis is untrue? We need to make use of the fact that the difference between two observed proportions has approximately a Normal distribution, since this enables us to convert any value of Z into a probability P (as we have already learnt in previous sessions)

0.5With probability0.674exceedsZ

In our example, Z= 11.2 and so the probability P is (substantially )less than That means, if the proportion of smokers is same among men and women, the chances of getting such a big percentage difference in our study is less than We therefore have storing evidence that the proportin of smokers in men and women in the defined population is different (and is lower in women). We may also say the difference between the percentages is staitstically significant at the 0.1% level.

Exercise: we want to know wheter smoking depends on marital status TotalUnmarriedMarried 879 (34.2)147(34.9%)732 (34.1%)Smoking Yes Smoking no Total

The observed difference in percentages is The standard error of the difference (using the formula given above) is Z= P=

95% confidence interval for a difference in two percentages While giving the actual P-value is useful, we also need to give attention to estimating the magnitude of the difference and express the uncertainty in such an estimate by using a confidence intervals. The 95% confidence interval for the difference between two percentages is Observed difference ±1.96×Standard Error of difference

In the calculation of the confidence interval, the formula for the standard error of the difference does not assume the null hypothesis of the two proportions being equal. A slightly different formula is used for the standard error. SE (difference in proportions)=

In our study, for smoking difference between men and women 95% confidence interval is =17.7% to 24.9%

Exercise Calculate 95% confidence interval for difference in percentage of smokers among married and unmarried individuals SE=2.54 CI=0.8±1.96×2.54=-42%, 5.8%

Note that if such a 95% confidence interval for a difference includes the value 0.0 (i.e one limit is positive and the other is negative), then P is greater than 0.05 Conversely, if the 95% confidence interval does not include 0.0 then P is less than 0.05 This illustrates that there is a close link between significance testing and confidence intervals.