Categorical Data Analysis: When life fits in little boxes AnnMaria DeMars, PhD.
What are we going to do today? Basic statistics Logistic regression
An actual example Are you going to die soon?
Our data Kaiser Permanente Study of the Oldest Old, and : [California] DEPENDENT VARIABLE: Dthflag = 1 if Died during study period 0 if alive at end of study period
Our data PREDICTOR VARIABLES: nursehome = 0 if lived at home continuously 1 = admitted to nursing home at any time
We all knew FREQ DID THIS PROC FREQ DATA = dsname ; TABLES varname1 * varname2 / chisq ; …AND THAT WITH THIS YOU YOU GET: – Chi-square value (several) – Phi coefficient – Fisher Exact test (where applicable)
Lets Start Simple: PROC FREQ PROC FREQ DATA =in.old ; TABLES dthflag ;
Not Too Interesting… 55.4% of our sample died.
…So lets Dig Deeper: STACKED BAR CHART
Stacked Bar Chart with SAS Enterprise
STACKED BAR Figure 1
STACKED BAR Figure 2
The Syntax PROC GCHART DATA=mydata.oldpeople; VBAR gender / SUBGROUP=dthflag TYPE= PCT INSIDE=PCT ;
Lets Keep Digging! Association Measures
Enterprise Guide Method
The Syntax PROC FREQ DATA = mydata.oldpeople ; TABLES dthflag*nursehome / NOROW NOPERCENT NOCUM CHISQ MEASURES ;
Nursing home placement by death Conditional probabilities
Being able to find SPSS in the start menu does not qualify you to perform a multinomial logistic regression
Chi-square results
The options & what they tell you
Chi-square results Pearson ∑ (f o – f e ) 2 f e
Chi-square results
What is Fisher’s exact test & when do I get one?
Fisher’s Exact Test: probability of a table as unusual as the one that you have obtained under the null hypothesis of no relationship.
With 2 x 2 Tables it’s automatic
Recap: Fisher’s Exact Test Small sample size OR Need exact probability
A bunch of things you may not know Proc Freq Does
Computing odds ratios Divide frequency row 1, column 1 by frequency in row 2 column 2 2,846/184 = odds of a person who lived not being in a nursing home versus being in a home. Divide frequency in row 2, column 1 by frequency row 2, column 2 2,239/ 1,077 = 2.08 Divide first result by the second 13.51/ 2.08 = 6.49
Measures
Mantel-Haeszel chi-square Tests ordinal relationship Same as Pearson if only two categories
Ordinal relationship ?
Don’t just compare values
ER visits versus nursing home
More measures
Take-away 1.Different types of chi-square values, different types of correlations and other tests like odds ratios do exist. 2.These statistics are very easy to obtain using SAS. 3.While most times, all of these measures will point you in the direction of the same general conclusion, there are times when one is preferable to the others.
Take-away 2 Non-standard hypotheses call for non- standard statistics
What about this ? PROC FREQ DATA = dsname ; TABLES varname / BINOMIAL (EXACT EQUIV P =.333) ALPHA =.05 ;
What’s it Do? The binomial (equiv p =.333) will produce a test that the population proportion is.333 for the first category. That is “No” for death. A Z- value will be produced and probabilities for one-tail and two-tailed tests. The exact keyword will produce confidence intervals and, since I have specified alpha =.05, these will be the 95% confidence intervals.
Again, Not New
Hmmm…. This is interesting
Null rejected !