© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 1 S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables What types of data are collected? “Categorical” Data “Continuous” Data What kinds of question can be asked of those data? “Descriptive” Questions How many members of the class are women? What proportion of the class is fulltime? …. ? How tall are class members, on average? How many hours a week do class members report that they study? …. ? “Relational” Questions Are men more likely to study part-time? Are women more likely to enroll in USP? …. ? Do people who say they study for more hours think they’ll finish their doctorate earlier? Are computer literates less anxious about statistics? …. ? Good research is a partnership of questions and data
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide (2475 cases) (2475 cases) This is a question about whether variable DEATH is “related to” variable RVICTIM, in the DEATHPEN dataset relational question So, in our DEATHPEN data, an important relational question is: or RQ: Is it more probable that a convicted murderer will be sentenced to death in Georgia if he kills someone Black or if he kills someone White? This is what we’ll think about in today’s class. As a part of it, and of all future analyses of relationships, we will have to figure out: How to DISPLAY a relationship between two categorical variables, with a block chart. How to DESCRIBE the relationship, by computing pairs of sample percentages. How to SUMMARIZE the relationship in a single statistical index. This is what we’ll think about in today’s class. As a part of it, and of all future analyses of relationships, we will have to figure out: How to DISPLAY a relationship between two categorical variables, with a block chart. How to DESCRIBE the relationship, by computing pairs of sample percentages. How to SUMMARIZE the relationship in a single statistical index. How can we best address this kind of relational question? How can we best address this kind of relational question? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables In other words, are the contents of the DEATH column somehow connected with the contents of the RVICTIM column in some meaningful way?
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 3 Let’s begin is by displaying the RVICTIM and DEATH variables simultaneously in some way -- I’ve done this in Data-Analytic Handout Class04.1 … here’s the PC-SAS program from this handout …Class04.1 Let’s begin is by displaying the RVICTIM and DEATH variables simultaneously in some way -- I’ve done this in Data-Analytic Handout Class04.1 … here’s the PC-SAS program from this handout …Class04.1 OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 4/Handout 1: Displaying Relationships Between Categorical Variables'; TITLE3 'Death penalty and race bias in Georgia'; TITLE4 'Data in DEATHPEN.txt'; * * Input data, name and label variables in dataset * *; DATA DEATHPEN; INFILE 'C:\DATA\A010Y\DEATHPEN.txt'; INPUT DEATH RDEFEND RVICTIM; LABEL DEATH = 'Sentenced to death?' RDEFEND = 'Race of defendant' RVICTIM = 'Race of victim'; * * Format labels for values of categorical variables * *; PROC FORMAT; VALUE DLABELS0 = 'No'1 = 'Yes'; VALUE RLABELS1 = 'Black'2 = 'White'; * * Display Relationship between DEATH and RVICTIM * *; PROC CHART DATA=DEATHPEN; TITLE5 'Block Chart of the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DLABELS. RVICTIM RLABELS.; BLOCK RVICTIM / GROUP=DEATH DISCRETE; OPTIONS Nodate Pageno=1; TITLE1 'A010Y: Answering Questions with Quantitative Data'; TITLE2 'Class 4/Handout 1: Displaying Relationships Between Categorical Variables'; TITLE3 'Death penalty and race bias in Georgia'; TITLE4 'Data in DEATHPEN.txt'; * * Input data, name and label variables in dataset * *; DATA DEATHPEN; INFILE 'C:\DATA\A010Y\DEATHPEN.txt'; INPUT DEATH RDEFEND RVICTIM; LABEL DEATH = 'Sentenced to death?' RDEFEND = 'Race of defendant' RVICTIM = 'Race of victim'; * * Format labels for values of categorical variables * *; PROC FORMAT; VALUE DLABELS0 = 'No'1 = 'Yes'; VALUE RLABELS1 = 'Black'2 = 'White'; * * Display Relationship between DEATH and RVICTIM * *; PROC CHART DATA=DEATHPEN; TITLE5 'Block Chart of the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DLABELS. RVICTIM RLABELS.; BLOCK RVICTIM / GROUP=DEATH DISCRETE; S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables This part of the PC-SAS program – the options, the titling, the data-input and the formatting sections are identical to those you saw earlier This part of the PC-SAS program contains the instructions for the data analysis, as before, but now I use PROC CHART to display the values of the RVICTIM and DEATH variables jointly. Let’s look at it in greater detail. This part of the PC-SAS program contains the instructions for the data analysis, as before, but now I use PROC CHART to display the values of the RVICTIM and DEATH variables jointly. Let’s look at it in greater detail.
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 4 joint displayvalues variables Here’s the data-analysis part of the program – the part that produces the joint display of the values of the RVICTIM and DEATH variables … * * Display Relationship between DEATH and RVICTIM * *; PROC CHART DATA=DEATHPEN; TITLE5 'Block Chart of the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DLABELS. RVICTIM RLABELS.; BLOCK RVICTIM / GROUP=DEATH DISCRETE; * * Display Relationship between DEATH and RVICTIM * *; PROC CHART DATA=DEATHPEN; TITLE5 'Block Chart of the Relationship Between DEATH and RVICTIM'; FORMAT DEATH DLABELS. RVICTIM RLABELS.; BLOCK RVICTIM / GROUP=DEATH DISCRETE; S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables Again, we invoke the PROC CHART procedure in the DEATHPEN dataset. title Again, we add a new fifth title to clarify what is being plotted. format Again, we format both categorical variables being analyzed so that their values are labeled appropriately on the output. We still ask for a BLOCK chart of the values of the RVICTIM variable. But, this time, we include an optional statement, after the forward slash (/), which requests that a block chart be created of RVICTIM, for each value of the variable DEATH. This is achieved by adding the option GROUP=DEATH. RVICTIM This causes two block charts of the variable RVICTIM to be produced and lined up one behind the other. But, this time, we include an optional statement, after the forward slash (/), which requests that a block chart be created of RVICTIM, for each value of the variable DEATH. This is achieved by adding the option GROUP=DEATH. RVICTIM This causes two block charts of the variable RVICTIM to be produced and lined up one behind the other.
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 5 Here’s the resulting parallel block chart of the values of RVICTIM, grouped by the values of variable DEATH … S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables … In our sample of convicted murderers in Georgia, when a Black person was a victim … In our sample of convicted murderers in Georgia, when a White person was a victim …
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 6 What’s the story here, would you say that DEATH and RVICTIM are related? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables Combining the two summary percentages …
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 7 What does it actually mean to say that “two variables are related”? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 8 S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables To answer the question: Are DEATH and RVICTIM related, in our observed sample … … it might help if we imagined, hypothetically, what the DEATH by RVICTIM block chart would look like if there really were NO relationship in the sample?
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 9 S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables What do you think … Is there a relationship between DEATH and RVICTIM?
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 10 How can we summarize, in a single index, the magnitude of the relationship between DEATH and RVICTIM in our data? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables If there were no relationship between DEATH and RVICTIM in the sample, we might anticipate that there would be no “net discrepancy” between the sample observed and expected frequencies. Conversely, we might argue that the bigger the “net discrepancy” between the observed and expected frequencies, the “stronger” the relationship between DEATH and RVICTIM. Can we devise a sensible index of “net discrepancy” between observed and the expected frequencies in the sample”?
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 11 One potential “Index of Net Discrepancy” between the observed and expected frequencies? How can we summarize, in a single index, the magnitude of the relationship between DEATH and RVICTIM in our data? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 12 How can we summarize, in a single index, the extent or magnitude of the relationship between DEATH and RVICTIM in our data? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables Another potential “Index of Net Discrepancy” between the observed and expected frequencies?
© Willett, Harvard University Graduate School of Education, 6/23/2016S010Y/C04 – Slide 13 How can we summarize, in a single index, the extent or magnitude of the relationship between DEATH and RVICTIM in our data? A “Decision Rule” for Deciding Whether There Is A Relationship Between Two Categorical Variables? A “Decision Rule” for Deciding Whether There Is A Relationship Between Two Categorical Variables? S010Y: Answering Questions with Quantitative Data Class 4: II.2 Examining A Relationship Between Categorical Variables