Data analysis: cross-tabulation GAP Toolkit 5 Training in basic drug abuse data management and analysis Training session 11
Objectives To introduce cross-tabulation as a method of investigating the relationship between two categorical variables To describe the SPSS facilities for cross-tabulation To discuss a range of simple statistics to describe the relationship between two categorical variables To reinforce the range of SPSS skills learnt to date
Bivariate analysis The relationship between two variables A two-way table: –Rows: categories of one variable –Columns: categories of the second variable
FrequencyPercentValid PercentCumulative Percent ValidMale Female Total MissingSystem6.4 Total Gender
FrequencyPercentValid PercentCumulative Percent ValidSwallow Smoke Snort Inject Total MissingSystem13.8 Total Mode of ingestion Drug 1 Out-of-range values (note that none of the digits are > 5)
Cleaning Mode1 Save a copy of the original Recode the out-of-range values into a new value (for example,12, 15, 23, 24,25, 34, 234 into the value 8) Set the new value as a user-defined missing value (for example, 8 is declared a missing value and given the label “Out-of-range”).
FrequencyPercentValid PercentCumulative Percent ValidSwallow Smoke Snort Inject Total MissingOut-of-range382.4 System13.8 Total513.2 Total Mode of ingestion Drug 1
Gender MaleFemaleTotal Swallow Smoke Snort Inject Total Mode of ingestion Drug1 Row totals Joint frequencies Grand total Count Mode of ingestion Drug1 * Gender cross-tabulation Column totals
Percentages The difference in sample size for men and women makes comparison of raw numbers difficult Percentages facilitate comparison by standardizing the scale There are three options for the denominator of the percentage: –Grand total –Row total –Column total
Gender MaleFemaleTotal SwallowCount % of Total39.6%12.8%52.4% SmokeCount % of Total36.5%5.1%41.6% SnortCount % of Total2.9%1.1%4.0% InjectCount % of Total1.3%.7%2.0% TotalCount % of Total80.3%19.7%100.0% Mode of ingestion Drug1 Marginal distribution Mode1 Joint distribution Mode1 & Gender Mode of ingestion Drug1 * Gender cross-tabulation Marginal distribution Gender
Mode of ingestion Drug1 * Gender cross-tabulation Gender MaleFemaleTotal SwallowCount % within Mode of ingestion Drug1 75.6%24.4%100.0% SmokeCount % within Mode of ingestion Drug1 87.8%12.2%100.0% SnortCount % within Mode of ingestion Drug1 72.1%27.9%100.0% InjectCount % within Mode of ingestion Drug1 66.7%33.3%100.0% TotalCount % within Mode of ingestion Drug1 80.3%19.7%100.0% The distribution of Gender conditional on Mode1 Mode of ingestion Drug1
Mode of ingestion Drug1 * Gender cross-tabulation Gender MaleFemaleTotal SwallowCount % within Gender49.3%65.1%52.4% SmokeCount % within Gender45.4%25.8%41.6% SnortCount % within Gender3.6%5.7%4.0% InjectCount % within Gender1.6%3.4%2.0% TotalCount % within Gender100.0% Mode of ingestion Drug1 The distribution of Mode1 conditional on Gender
Choosing percentages “Construct the proportions so that they sum to one within the categories of the explanatory variable.” Source: (C. Marsh, Exploring Data: An Introduction to Data Analysis for Social Scientists (Cambridge, Polity Press, 1988), p )
Dimensions Definitions of vertical and horizontal variables
Two-by-two tables Tables with two rows and two columns A range of simple descriptive statistics can be applied to two-by-two tables It is possible to collapse larger tables to these dimensions
Gender * White pipe cross-tabulation White pipe YesNoTotal MaleCount % within Gender23.2%76.8%100.0% FemaleCount % within Gender7.0%93.0%100.0% TotalCount % within Gender19.9%80.1%100.0% Gender
White pipe YesNo GenderMale Female
Relative risk Divide the probabilities for “success”: –For example: P(Whitpipe=Yes|Gender=Male)= P(Whitpipe=Yes|Gender=Female)= Relative risk is /0.0701=3.309 The proportion of males using white pipe was over three times greater than females
Odds The odds of “success” are the ratio of the probability of “success” to the probability of “failure” For example: - For males the odds of “success” are /0.7682= For females the odds of “success” are /0.9299=0.075
Odds ratio Divide the odds of success for males by the odds of success for females For example: 0.302/0.075=4.005 The odds of taking white pipe as a male are four times those for a female
95% Confidence interval ValueLowerUpper Odds ratio for Gender (Male / Female) For cohort white pipe = Yes For cohort white pipe = No N of valid cases1565 Risk estimate Relative risk of “success” Relative risk of “failure” Odds ratio M/F
Exercise 1: cross-tabulations Create and comment on the following cross-tabulations: –Age vs Gender –Race vs Gender –Education vs Gender –Primary drugs vs Mode of ingestion Suggest other cross-tabulations that would be useful
Exercise 2: cross-tabulation Construct a dichotomous variable for age: Up to 24 years and Above 24 years Construct a dichotomous variable for the primary drug of use: Alcohol and Not Alcohol Create a cross-tabulation of the two new variables and interpret Generate Relative Risks and Odds Ratios and interpret
Summary Cross-tabulations Joint frequencies Marginal frequencies Row/Column/Total percentages Relative risk Odds Odds ratios