CATEGORICAL VARIABLES Testing hypotheses using CATEGORICAL VARIABLES
Assessing relationships between categorical variables Hypothesis: formal statement of cause and effect, derived from research question and literature review Better demeanor more leniency To test a relationship between categorical variables, we build TWO tables Frequencies table Dependent variable values in COLUMNS Independent variable values in ROWS Frequencies (no. of cases with corresponding values) in the cells Percentages table Calculated row by row, top to bottom ROW percentages - each row totals 100% This is the only acceptable way to build tables in this class. Must do it this way to earn credit on the exams. Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest n Cooperative 24 15 4 2 45 Uncooperative 1 5 14 21 Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest Cooperative 53% 33% 9% 5% 100% Uncooperative 23% 67%
Recoding continuous to categorical variables Hypothesis: higher income persons drive more expensive cars Independent variable: Income, measured categorically (nominal variable) Two values: low income and high income Income is measured by where a car is parked - student lot (low income) and faculty-staff lot (high income) Dependent variable: Car value, measured categorically (ordinal variable) Low, Medium, High Sampling Stratified, disproportionate, systematic sampling of 10 cars from a student lot, and 10 cars from a faculty lot Coding Income is automatically coded by car location (faculty-staff or student lot) Car values must be recoded from original five-point scale to three categories - Low, Medium, High
Step 1: Recoding Step 2: Percentaging Recode continuous 1-5 car value measure into three categories, L/M/H 1-2: LOW 3: MED 4-5: HIGH DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 Step 2: Percentaging Convert frequencies into percentages Convert each row separately so the cells add to 100 percent DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 100% HIGH (F/S lot) 40% 60%
Draw the distributions and examine their shapes DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 100% HIGH (F/S lot) 40% 60% Step 3: Distributions Draw the distributions and examine their shapes Student lot L Faculty lot H L Step 4: Analysis Switch values of the independent variable. Does the distribution of cases substantially change in the predicted direction? All the cars in the student lot are low value. Sixty percent of the cars in the F/S lot are high value. As we “switch” values of the IV, the distribution changes in the hypothesized direction, with larger proportions of more expensive cars in the F/S lot. Hypothesis confirmed!
An example where the hypothesis is NOT confirmed Student lot H L Faculty lot L M H DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 30% 20% 50% 100% HIGH (F/S lot) 40% 60% Differences between rows are relatively slight For both income levels, distribution is biased to higher car values Is there a more accurate way to assess possible differences? You bet! That’s where we’re heading…
Hypothesis: poverty crime Remember: if there appears to be an effect, is it in the hypothesized direction? Hypothesis: poverty crime IV Poverty is measured by income, DV crime by arrests Income has two values, low and high Arrests has two values, never arrested and arrest record To test the hypothesis, switch from one category of the IV to the other. Does the distribution of cases along the DV change? Is the change substantial? Is the change in the hypothesized direction? Never Arrested Arrest Record Low Income 50% 100% High Never Arrested Arrest Record Low Income 20% 80% 100% High Never Arrested Arrest Record Low Income 80% 20% 100% High Distribution is unaffected. There seems to be no connection between income and arrest record. The hypothesis is rejected. Distribution flips in the expected direction. High income persons seem much less likely to have an arrest record. The hypothesis is confirmed. Distribution flips in the opposite direction. High income persons seem much more likely to have an arrest record. The hypothesis is rejected.
Elaboration analysis Zero-order table First-order partial tables Replication Explanation Specification
Could another IV be at play? Remember our poverty crime example? Intervening variable: Could lack of education or living in a violent area be the more proximate (closer) cause of crime? Poverty poor education crime Here poverty still affects crime, but mostly through intervening variable education, which is the more proximate cause Spurious relationship: What seems to be a relationship isn’t - it’s bogus! Often caused by a strong association between the independent variable of interest (e.g., poverty) and another independent variable (e.g., poor social controls) which turn out to be the real cause Poor social controls crime Poverty X
PRACTICAL EXERCISE Hypothesis: Higher rank Less cynicism Using “elaboration analysis” to determine if another IV might be at play PRACTICAL EXERCISE Hypothesis: Higher rank Less cynicism Sample of 100 officers and 100 supervisors Twenty officers scored low on cynicism; 80 were high cynicism Fifty supervisors scored low on cynicism; 50 were high cynicism Build a frequency table, then convert it to percentages Be sure to place the categories of the dependent variable in columns, and the categories of the independent variable in rows
PRACTICAL EXERCISE Hypothesis: Higher rank Less cynicism Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% Well, that was easy! Looks like the hypothesis (higher rank, less cynicism) is confirmed Of course, we can’t stop here. There are many variables floating around. Are there any other variables that are related to our independent variable, rank, and which could possibly affect dependent variable cynicism? A literature review reveals that gender is associated with rank, and that gender may also affect cynicism. Gender could be an “intervening” variable, mediating the relationship with cynicism… Rank Gender Cynicism Or, it could be the real cause of changes in cynicism If it was, the apparent (but non-existing) relationship between rank and cynicism would be called “spurious” How do we disentangle the possibilities? Coming up!
Hypothesis: Higher rank Less cynicism Elaboration analysis - using first-order partial tables to analyze the effects of a “control” variable Let’s “elaborate” (dig deeper). Does the effect of rank on cynicism hold regardless of gender? Gender is used as a “control” variable. We will test the original, “zero-order” relationship between rank and cynicism, “controlling” for each value of gender. Gender is categorical, so we keep using tables Create two tables - one with frequencies, the other with percentages - for each value of control variable gender Tables are just like the original, but each pair only includes cops of one gender (M or F) These tables are called “first order partial tables” because… “First order”: It’s our first “control” variable “Partial”: Each table only includes cases at one value of the control variable gender (Male or Female) To distinguish, the original tables are called “zero-order” Data for gender MALES: Officers: 10 low cynicism, 50 high Supervisors: 35 low, 35 high FEMALES: Officers: 10 low, 30 high Supervisors: 15 low, 15 high Analyze the percentage tables - what do you find? Hypothesis: Higher rank Less cynicism “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High n Officers 20% 80% 100% Supervisors 50%
First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 10 50 60 Supervisors 35 70 130 Cynicism Rank Low High n Officers 10 30 40 Supervisors 15 70 Males Females Cynicism Rank Low High Officers 17% 83% 100% Supervisors 50% Cynicism Rank Low High Officers 25% 75% 100% Supervisors 50% Both levels of control variable gender - male and female - yield about the same findings as the zero-order table. Knowing an officer’s gender changes nothing. We have replicated the zero-order findings. Gender is not a factor in rank cynicism. Our hypothesis is confirmed. That’s called: REPLICATION
Hypothesis: Higher rank Less cynicism But we’re not done yet! Our literature review suggested that another variable associated with rank – time on the job – may also affect cynicism. Let’s “control” for time on the job. Here’s the data: LESS THAN FIVE YEARS ON THE JOB Officers: 0 low cynicism, 75 high cynicism Supervisors: 2 low cynicism, 40 high cynicism MORE THAN FIVE YEARS ON THE JOB Officers: 20 low, 5 high Supervisors: 48 low, 10 high Create first-order partial tables for time on the job, convert tables to percentages, and analyze the results...
First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 75 Supervisors 2 40 42 117 Cynicism Rank Low High n Officers 20 5 25 Supervisors 48 10 58 83 < 5 years 5+ years Cynicism Rank Low High Officers 0% 100% Supervisors 5% 95% Cynicism Rank Low High Officers 80% 20% 100% Supervisors 83% 17% Both levels of control variable time on the job are strongly associated with cynicism. This completely “explains” (wipes away) the apparent relationship between rank and cynicism. Our hypothesis (higher rank less cynicism) is rejected. It’s less time on the job more cynicism. This outcome is called: EXPLANATION
Hypothesis: Higher rank Less cynicism But we’re not done yet! Our literature review suggested that another variable associated with rank – education – may also affect cynicism. Let’s “control” for education. Here’s the data: HIGH SCHOOL Officers: 10 low cynicism, 50 high cynicism Supervisors: 9 low cynicism, 11 high cynicism COLLEGE Officers: 30 low, 10 high Supervisors: 65 low, 15 high Create first-order partial tables for educational level, convert tables to percentages, and analyze the results...
First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 10 50 60 Supervisors 9 11 20 80 Cynicism Rank Low High n Officers 30 10 40 Supervisors 65 15 80 120 High school College Cynicism Rank Low High Officers 17% 83% 100% Supervisors 45% 55% Cynicism Rank Low High Officers 75% 25% 100% Supervisors 81% 19% For education level, our findings are split. The cynicism of high school grads resembles that of the zero-order table. But college-educated cops are much less cynical. One value of the control variable - education - teaches us something new. This outcome is called: SPECIFICATION
Review: first-order partial analysis has three possible outcomes First-order (and second-order, and so on) partial analyses yield three possible outcomes: Replication (first example): The original, “zero-order” relationship persists at each value of the control variable. Coding for this variable teaches us nothing. The hypothesis is confirmed. Explanation (second example): The original, “zero-order” relationship is absent from each value of the control variable. The effect of the IV on the DV in the zero-order table has been “explained away” by the control variable. The hypothesis is rejected. Specification (final example): The zero-order relationship persists for some but not all values of the control variable. Coding for this variable teaches us something. The hypothesis must be modified.
But isn’t our analysis too “loosey-goosey”? Assume there is a relationship between variables. When we “switch” the value of the IV, will the change in the DV always be obvious? No. And when the DV has multiple categories, such as in our parking lot exercise, visually discerning an effect can be difficult or impossible. Bottom line - changes in percentage are not enough. Great. Now what? We can use the original frequencies table to build a second table, which projects what the frequencies would be if there was absolutely NO relationship between variables. We then compare the two tables with a “test statistic” (statistic used to test hypotheses) known as “Chi-square”, X2. If this statistic is of sufficient magnitude (large enough), the tables are deemed “significantly” different, meaning that there IS a relationship between variables. More on this during the third part of the semester! Cynicism - Males Rank Low High Officers 17% 83% 100% Supervisors 50% Cynicism - Males Rank Low High n Officers 10 50 60 Supervisors 35 70 130
Dependent variable: arrest (Y/N) Where we headed? Dependent variable: arrest (Y/N) Hypothesis testing requires that we assess the influence, on the DV, of your IV of interest, and of other variables that may be related to your IV and which could have their own effect on the DV. Tables in your articles parcel out the effects of all variables on the DV. A “test statistic” informs you exactly how much influence on the DV is exerted by your IV of interest and by its cousins. We’ll be learning about various test statistics during the semester Dependent variable: death sentence (Y/N)
REMINDER: The ONLY acceptable way to build a table in this class Dependent variable values in COLUMNS Independent variable values in ROWS Frequencies (number of cases with corresponding values) in the cells Two tables One table contains the frequencies A second table, calculated from the first, contains ROW percentages This is the only acceptable way to build tables in this class. Must do it this way to earn credit on the exams. Better demeanor More lenient disposition Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest n Cooperative 24 15 4 2 45 Uncooperative 1 5 14 21 Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest Cooperative 53% 33% 9% 5% 100% Uncooperative 23% 67%