Presentation is loading. Please wait.

Presentation is loading. Please wait.

Categorical Data Analysis Review for Final

Similar presentations


Presentation on theme: "Categorical Data Analysis Review for Final"— Presentation transcript:

1 Categorical Data Analysis Review for Final
Sucharita & Cookie Spring 2013

2 Overview Binary Logistic Regression Resampling

3 Odds & Odds Ratio Odds : (see CD06, p. 6) The odds a woman drinks is 524/358 = 1.464 Interpretation: When odds >1, frequency of ‘A’ is larger than frequency of ‘not A’ Odds ratio: Used to measure the association between the two dichotomous variables Effect Size (but not easy to interpret) Interpretation of odds ratio: (1) The odds of A in Group1 is _____ times the odds of A in Group2 (2) The odds of A in Group2 is (1 / _____) times the odds of A in Group1 Expected odds ratio when there is no effect: 1.00 Calculate significance of odds ratios with: Chi-square Wald (binary logistic regression)

4 Binary Logistic Regression
When: DV is a categorical variable (binary) Predictors are categorical or continuous, or both. Important statistics: Wald statistic (p06 p.25) – used to determine statistical sig of variable Wald Alone : dummy vs. all others dummies ignoring all other variables Wald at Entry : dummy vs. referent group controlling for all other variables Odds ratio at Final Model : effect sizes Use Wald for determining statistical significance of the variable Distributed approximately chi square Use B weight for determining probability of specific outcome Not very useful to interpret B weight because it is a logged value Two step process to calculate the U value and then the probability Interpretation of odds ratio: “The predicted odds of outcome 1 versus outcome 0 are [exp(B)] as great for the indicator group as for the reference group”

5 DATA World Value Survey (Between Sep 19th and Sep 29th 2006) (N = 1174) Predicted variable: Generally speaking, would you say that most people can be trusted or that you need to be very careful in dealing with people? Dichotomous DV: (0= need to be careful 1=trusted) Predictors of interest gender (1=male, 2=female) age (continuous variable) general political party preference: (1=Republican, 2= Democrat, 3=Independent, 4= non-partisan) Source: Jiin & Andrew’s 2012 final review slides

6 Check for the reference group
Categorical Variables Codings Frequency Parameter coding (1) (2) (3) politicalparty Republican 338 .000 Democrat 387 1.000 Independent 160 non-partisan 289 Sex male 596 female 578 Republicans here are the ref. group (0 on everything) Reference group for political party preference?

7 Variables in the Equation
Research questions Q1: Did more or less respondents say that most people can be trusted than that they need to be very careful in dealing with people? Dichotomous DV: (0= need to be careful 1=trusted) Step1: output  Block 0 Step2: report Wald (df=1, N=1174)=45.239, p <.001, Exp(B)=.670 Step3: interpretation  The odds of trusting is In other words, the number of respondents who said that most people can be trusted were significantly less than the number of respondents who said that they need to be very careful in dealing with people. Odds of trusted = # Trusted / #non-trusted = 471/703=.670 Odds of non-trusted = #non-trusted / # Trusted = 703/471 = 1.49* Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 0 Constant -.400 .060 45.239 1 .000 .670

8 Variables in the Equation
Research questions Q2: How is sex related to respondents’ perception that most people can be trusted? Step1: Output 03  Block 1 Wald (df=1, N=1174)=.063, p=.802, Exp(B)=1.030 Step2: interpretation  sex was not related to people’s perception about others as trusted. What are the odds of women saying others can be trusted compared to the odds of men trusting others? What are the odds of the men saying the same? Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a Sex .030 .119 .063 1 .802 1.030 Constant -.415 .084 24.617 .000 .660 a. Variable(s) entered on step 1: V235. The odds of women saying others can be trusted is times the odds of men saying others can be trusted. The odds of men saying others can be trusted vs. not trusted is .660.

9 Variables in the Equation
Research questions Q3: How is age related to respondents’ perception that most people can be trusted? Step1: Output 02  Block 1 Wald (df=1, N=1174)=22.463, p<.001, Exp(B)=1.017 Step2: interpretation  age was significantly related to people’s perception of others as trusted. On average, for each one year increase in age, the predicted odds of trusting someone is times as great. Older people are more likely to perceive others as trusted than younger people, on average. On average, with one unit decrease in age, what are the odds of trusting others? Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a Age .016 .003 22.463 1 .000 1.017 Constant -1.220 .179 46.250 .295 a. Variable(s) entered on step 1: V237. On average, with one unit decrease in age, what are the odds of trusting others?: 1/1.017

10 Variables in the Equation
Research questions Q4: How is political party preference related to respondents’ perception that most people can be trusted? Step1: Output 01  Block 1 Wald (df=3, N=1174)=7.261, p=.064 Step2: interpretation  political party preference was not significantly related to people’s perception of others as trusted. However, the Democrats are less likely to perceive others as trusted than Republicans, Wald =5.590, p<.05, Exp(B)=.699. The odds for Democrats to perceive others as trusted is .699 times the odds for Republicans. Furthermore, the Non-partisans are less likely to perceive others as trusted than Republicans, Wald =4.916, p<.05, Exp(B)=.696. The odds for Non-partisans to perceive others as trusted is .696 times the odds for Republicans. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a politicalparty 7.261 3 .064 Democrats -.358 .151 5.590 1 .018 .699 Independents -.304 .196 2.412 .120 .738 Non-partisan -.362 .163 4.916 .027 .696 Constant -.154 .109 1.996 .158 .857 a. Variable(s) entered on step 1: politicalparty. Reference group: republican 1: democrat 2: independent 3: non-parisan

11 Variables in the Equation
Research questions Q5: Does political party preference contribute beyond age and sex to predicting one’s perception that most people can be trusted? Step1: Output 03  Block 1 Wald (df=3, N=1174)=7.295, p=.063 Step2: interpretation  controlling for sex and age, political party preference was not significantly related to people’s perception of others as trusted. However, the Democrats are less likely to perceive others as trusted than Republicans, Wald =6.755, p<.01, Exp(B)=.671. The odds for Democrats to perceive others as trusted is .671 times the odds for Republicans. The odds ratio of Non-partisans over Republicans became non-significant when controlling for sex and age. Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1a Sex .070 .121 .334 1 .563 1.073 Age .016 .004 19.360 .000 1.016 politicalparty 7.295 3 .063 Democrat -.399 .153 6.755 .009 .671 Independents -.309 .197 2.452 .117 .734 Non-partisan -.287 .166 2.994 .084 .751 Constant -.968 .221 19.186 .380 a. Variable(s) entered on step 1: politicalparty.

12 Resampling When do you use it?
When parametric tests cannot be employed. The assumptions for using a parametric test have been violated Use statistics not typically used in parametric statistics (e.g. medians) Esp. since the distribution may not be normal: medians may be more useful

13 Howell’s Program How do you test the null hypothesis?
Randomization Bootstrapping Randomization: CI built around 0 (the null). Thus, if our median difference (or other statistic) lies outside this interval, we reject the null (Sampling without replacement) Bootstrapping: CI built around the statistic obtained, so if 0 is not included in the interval, we reject the null. (Sampling with replacement)

14 NHST & p Elements of p-value explanation
If Ho is true and all assumptions are met the probability of getting results this extreme or more extreme is [p-value] Ho is never true (Cohen, 1994)

15 T vs. F about p-value p = .01… True or False
There is a 1% chance that the decision to reject Ho is wrong. FALSE Assuming Ho is true and the study is repeated many times, about 1% of these results will be even more inconsistent with Ho than the observed result. TRUE

16 More T vs. F about p-value
The p value is the probability that the null hypothesis is true. The p value is the probability that a finding is merely a luck. The p value is the probability of falsely rejecting the null hypothesis. 1-p value is the probability that a replicating experiment would yield the same conclusion. The p value is the probability that a replicating experiment would not yield the same conclusion. 1 − p is the probability of the alternative hypothesis being true. The p-value indicates the size or importance of the observed effect. All False! You ran a study and got p=.01 If somebody replicate the study, how likely one gets p less .01?  50%

17 More T vs. F about p-value
T F A smaller p-value indicates a larger effect. :Explain If all the conditions are the same (same population, same sample size, same variables): Then a smaller p-value indicates a larger effect size. But NO, when you compare p-values from different populations, different sample size, or different variables.

18 99% Confidence Interval means… (True or False?)
There is a 99% chance that your interval captures the population mean but we are not sure!  True The population mean falls between CLs 99% of time.  False: The population mean is an unknown fixed value.

19 Summer is almost here!


Download ppt "Categorical Data Analysis Review for Final"

Similar presentations


Ads by Google