Download presentation
Presentation is loading. Please wait.
1
Correlation, OLS (simple) regression, logistic regression, reading tables
2
Review – What are the odds?
“Test” statistics help us evaluate whether there is a relationship between variables that goes beyond chance If there is, one can reject the null hypothesis of no relationship But in the social sciences, one cannot take more than five chances in one-hundred of incorrectly rejecting the null hypothesis Here is how we proceed: Computers automatically determine whether the test statistic’s coefficient (expressed numerically, such as .03) is of sufficient magnitude to reject the null hypothesis How large must a coefficient be? That varies. In any case, if a computer decides that it’s large enough, it automatically assigns one, two or three asterisks (*, **, ***). One asterisk is the minimal level required for rejecting the null hypothesis. It is known as < .05, meaning less than five chances in 100 that a coefficient of that magnitude (size) could be produced by chance. If the coefficient is so large that the probability is less than one in one-hundred that it was produced by chance, the computer assigns two asterisks (**) An even better result is three asterisks (***), where the probability that a coefficient was produced by chance is less than one in a thousand
3
CORRELATION
4
Displays relationships between variables
Correlation Simple association between variables - used when all are continuous r: simple relationship between variables Coefficients range between -1 and +1 (0 = no relationship) R: multiple correlation – correlation among multiple variables (seldom used) Computers automatically test correlations for statistical significance To test hypotheses must use regression (R2) Correlation “matrix” Displays relationships between variables Sig. (2-tailed) means that the significance of the relationship was computed without specifying the direction of the effect. The relationship is positive - both variables rise and fall together.
5
Correlation matrices Data analysis often begins with a correlation matrix Correlation matrices display the simple, “bivariate” relationships between every possible combination of continuous variables. Dependent variables are usually included. The same variables run in the same order down the left and across the top When a variable intersects with itself, “1.00” is inserted as a placeholder Effort Male Richard B. Felson and Jeremy Staff, “Explaining the Academic Performance-Delinquency Relationship,” Criminology (44:2, 2006)
6
regression
7
Test statistics Independent and dependent variables are continuous
Regression (r2 and R2) b statistic - interpreted as unit change in the DV for each unit change in the IV Independent variables are nominal or continuous; dependent variable is nominal Logistic regression, generates “b” and exp(b) (a.k.a. odds ratio) Independent and dependent variables are categorical Chi-Square (X2) Categorical dependent and continuous independent variables Difference between the means test (t statistic) Procedure Level of Measurement Statistic Interpretation Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. Unit change in the dependent variable caused by a one-unit change in the independent variable Logistic regression DV nominal & dichotomous, IV’s nominal or continuous exp(B) (odds ratio) Don’t try - it’s on a logarithmic scale Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Chi-Square All variables categorical (nominal or ordinal) X2 Reflects difference between Observed and Expected frequencies. Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference.
8
Regression (ordinary - known as “OLS”)
DV and IV’s continuous r2 – coefficient of determination: proportion of change in the dependent variable accounted for by the change in the independent variable R2 – same, summary effect of multiple IV’s on the DV b or B. Unit change in the DV for each unit change in the IV. Unlike r’s, which are on a scale of -1 to +1, b’s and B’s are not “standardized,” so they cannot be compared Lowercase (b) refers to a sample Uppercase (B) refers to a population (no sampling) For our purposes it makes no difference whether b’s are lowercase or uppercase. SE - the standard error. All coefficients include an error component. The greater this error the less likely that the b or B will be statistically significant. Hypothesis: Age Weight R2 = B = SE = sig = .000 For each unit change in age (year) weight will change 7.87 units (pounds) Since the B is positive, age and weight go up & down together Probability that the null hypothesis is true is less than 1 in 1,000
9
Hypothesis: Age Height
B is: Positive - variables go up & down together Highly significant R2 = .97 B = SE = sig = .000 Age range: B is negative: as one variable goes up, the other goes down (a tiny bit!) B is non-significant: (by age 20 we’re done growing) R2 = .07 B = SE = .169 sig = .152 Age range: Age range unrestricted Age range restricted
10
Another regression example
Hypothesis: observations of social disorder perceptions of social disorder Procedure Dependent variable is understood - it is “embedded” in the table (here it is “citizen perceptions of social disorder,” a continuous measure) Independent variables normally run down the left column Significant relationships (p <.05) are denoted two ways - with asterisks, and/or a p-value column When assessing a relationship, note whether the B or b is positive (no sign) or negative (- sign). Independent variables B SE p Joshua C. Hinkle and Sue-Ming Yang, “A New Look Into Broken Windows: What Shapes Individuals’ Perceptions of Social Disorder?,” Journal of Criminal Justice (42: 2014, 26-35)
11
R2 has siblings with similar interpretations
IV’s B S.E p IV’s B Exp B S.E p S.E. This hypothesis test used logistic regression (DV is nominal, 0 or 1.) The authors give the coefficients for two R2 stand-in’s, which supposedly report the same thing as R2. R2 reports the percentage of the change in the dependent variable that is accounted for by the changes in the independent variables, taken together. (It’s the IV’s total, “model” effect.) Joshua C. Hinkle and Sue-Ming Yang, “A New Look Into Broken Windows: What Shapes Individuals’ Perceptions of Social Disorder?,” Journal of Criminal Justice (42: 2014, 26-35)
12
Logistic regression
13
Test statistics Independent and dependent variables are continuous
Regression (r2 and R2) b statistic - interpreted as unit change in the DV for each unit change in the IV Independent variables are nominal or continuous; dependent variable is nominal Logistic regression, generates “b” and exp(b) (a.k.a. odds ratio) Independent and dependent variables are categorical Chi-Square (X2) Categorical dependent and continuous independent variables Difference between the means test (t statistic) Procedure Level of Measurement Statistic Interpretation Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. Unit change in the dependent variable caused by a one-unit change in the independent variable Logistic regression DV nominal & dichotomous, IV’s nominal or continuous exp(B) (odds ratio) Don’t try - it’s on a logarithmic scale Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Chi-Square All variables categorical (nominal or ordinal) X2 Reflects difference between Observed and Expected frequencies. Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference.
14
Logistic regression Used when dependent variable is nominal (i.e., two mutually exclusive categories, 0/1) and independent variables are nominal or continuous * Richard B. Felson, Jeffrey M. Ackerman and Catherine A. Gallagher, “Police Intervention and the Repeat of Domestic Assault,” Criminology (43:3, 2005) Dependent variable: Risk of a future assault (0,1) b is the logistic regression coefficient, in log-odds units. A negative b indicates a negative relationship. It’s not otherwise easily interpretable. Exp b, the “odds ratio,” is calculated from the b. It reports the effect on the dependent variable (DV) when a continuous IV changes one unit, or when a nominal IV changes from one value to the other. An Exp b of exactly 1 means no relationship: the odds are even (50/50) that as the IV changes one unit the DV will change one unit. In other words, the chances of correctly predicting an effect on the DV from a change in the IV are no better than a coin toss. Exp b’s greater than 1 indicate a positive relationship, less than 1 a negative relationship Arrest decreases (negative b) the odds of repeat victimization by 22 percent (1-.78 X 100 = .22) or .78 times, but the effect is non-significant (no asterisk) Not reported (positive b) increases the odds of repeat victimization by 89 percent ( X 100) or 1.89 times, a statistically significant change Prior victimization increases the odds of repeat victimization 408 percent ( X 100) or 5.08 times, also statistically significant
15
“Percent” v. “times” 2X 3X 200% 100% larger larger
two times three times larger larger 200% % larger larger
16
Comparing levels of measurement for dependent variables
Regular (OLS) regression - DV must be continuous Logistic regression - DV must be nominal (0/1) DV: perception of social disorder (scale 0-14) DV: feel unsafe (no = 0, yes = 1) IV’s B S.E p IV’s B Exp B S.E p S.E. Joshua C. Hinkle and Sue-Ming Yang, “A New Look Into Broken Windows: What Shapes Individuals’ Perceptions of Social Disorder?,” Journal of Criminal Justice (42: 2014, 26-35)
17
Logistic regression: measurement examples
DV Feeling unsafe: Four response categories in instrument collapsed to 0 = no, 1 = yes Independent variables continuous or nominal Scale, actual 0-14 Scale, actual 0-18 Scale, actual 1-17 Scale, actual 3-142 1 = yes 0 = no 1 = yes 0 = no Scale, actual 1-4 Scale, actual 1 = yes 0 = no Scale, actual 18-90 1 = yes 0 = no 1 = B 0 = W Comparison group = White 1 = H 0 = W 1 = OM 0 = W Scale, actual % 1 = yes 0 = no 1 = yes 0 = no Joshua C. Hinkle and Sue-Ming Yang, “A New Look Into Broken Windows: What Shapes Individuals’ Perceptions of Social Disorder?,” Journal of Criminal Justice (42: 2014, 26-35)
18
Main independent variable:
Practical exercise - logistic regression Effects of broken homes on future youth behavior Main independent variable: broken home Dependent variable: conviction for crime of violence Use Exp(B) and percentages to describe the effects of significant indep. variables Describe the levels of significance using words Delphone Theobald, David P. Farringron and Alex R. Piquero, “Childhood Broken Homes and Adult Violence: An Analysis of Moderators and Mediators,” Journal of Criminal Justice (41:1, 2013)
19
Youths from broken homes were 236 percent more likely of being convicted of a crime of violence. The effect was significant, with less than 1 chance in 100 that it was produced by chance. Youths with poor parental supervision were 128 percent more likely to be convicted of a violent crime. The effect was significant, with less than 5 chances in 100 that it was produced by chance. Delphone Theobald, David P. Farringron and Alex R. Piquero, “Childhood Broken Homes and Adult Violence: An Analysis of Moderators and Mediators,” Journal of Criminal Justice (41:1, 2013)
20
Using logistic regression to analyze parking lot data
A different hypothesis: Car value parking lot Independent variable: Car value. Continuous, 1-5 scale. Dependent variable: Parking lot. Nominal, student lot = 0, faculty lot = 1 b = 1.385* Exp (B) = 4.0 Sig.: .04 Effect: Since b is positive, for each unit that car value increases (one step on the 1-5 scale) it is four times (300 percent) more likely that lot type will go from 0 (student) to 1 (faculty). This effect is consistent with the hypothesis. Calculation: Exp B X 100 = 300 percent. Probability that null hypothesis is true: Less than 4 chances in 100 (one asterisk, actual significance .038). Because its probability is less than 5 in 100, the null hypothesis is rejected and the working hypothesis is confirmed.
21
Car value parking lot Spring 2017 4, 5, 2, 5, 2, 1, 4, 4, 5, 1
Faculty lot car values 4, 5, 2, 5, 2, 1, 4, 4, 5, 1 IV: car value 1-5 scale DV: parking lot 0 = student lot 1 = faculty lot Panel Student lot car values b Exp(b) sig. TTH 1 3, 3, 3, 4, 4, 2, 4, 2 ,4 ,3 .07 1.1 .86 2 1, 2, 2, 1, 1, 1, 2, 1, 2, 2 1.3* 3.6 .04 3 3, 2, 1, 5, 1, 2, 3, 3, 4, 2 .35 1.4 .28 4 1, 3, 3, 1, 1, 3, 3, 1, 4, 2 .58 1.8 .10 5 2, 2, 5, 1, 2, 4, 2, 4, 2, 1 .38 1.5 .24 6 5, 3, 3, 2, 1, 4, 2, 4, 5, 2 1.0 .76 All b’s were positive, thus consistent with the hypothesis, but many samples turned out quite differently. A real study would require a far larger sample size and training coders so that their work product was more reliable. F 1 3, 1, 4, 4, 2, 3, 1, 5, 4, 5 .05 1.0 .88 2 2, 2, 1, 2, 2, 1, 1, 2, 2, 3 1.0* 2.7 .04 3 5, 1, 1, 1, 1, 4, 2, 4, 3, 2 .39 1.5 .21 4 3, 4, 5, 2, 1, 3, 3, 1, 3, 4 .20 1.2 .53 5 2, 1, 2, 3, 4, 1, 1, 2, 2, 3 .67 2.0 .08 6 1, 2, 2, 1, 2, 1, 1, 1, 2, 1 1.4* 4.0
22
Logistic regression – deriving exp(b) from b
Sometimes authors don’t include a column for odds ratios. But in the text they may still describe the effects of the IV’s on the DV in percentage terms. Go figure! Logistic regression – deriving exp(b) from b Use an exponents calculator For “number,” always enter the constant 2.72 For “exponent,” enter the b or B value, also known as the “log-odds” The result is the odds ratio, also known as exp(b) In the left example the b is 1.21, and the exp(b) is 3.36. Meaning, for each unit change in the IV, the DV increases 236 percent In the right example the b is (note the negative sign) and the exp(B) is .543 Meaning, for each unit change in the IV, the DV decreases 46 percent ( )
23
READING TABLES
24
Two dependent variables: Commitment to CJ system & commitment to mental health system
b: Regression coefficient, with probabilities (*, **, ***.) Positive means IV and DV go up and down together, negative means as one rises the other falls. SE: Standard error of the measurement OR: Odds ratio (Exp B) Usually reported when logistic regression is used Independent variables run along the left column A “typical” table References David M. Ramey, “The Influence of Early School Punishment and Therapy/Medication on Social Control Experiences During Young Adulthood,” Criminology, 54:1, Feb. 2016
25
Each “model” is a unique combination of independent variables
Reporting effects on dependent variable under different combinations (“models”) of the independent variables DV: “co-offending” Each “model” is a unique combination of independent variables Regression coefficient. Positive means IV and DV go up and down together, negative means as one rises the other falls. Different ways to measure the main IV’s (each is a separate independent variable) Additional, “control” independent variables. Each is measured on a scale or, if nominal (e.g., gender, M or F) is coded 0 or 1. The value displayed on the table is “1”, and the other value is “0”. Here male=1 so female=0. White=1 so non-white=0. If there are more than two values, say, Black, White and other, Black and White become separate 0/1 variables and “other” becomes the “comparison group.” Holly Nguyen and Jean Marie McGloin, “Does Economic Adversity Breed Criminal Cooperation? Considering the Motivation Behind Group Crime,” Criminology (51:4, 2013)
26
Reporting effects on dependent variable under multiple conditions
Dependent variable: victimization Richard B. Felson and Keri B. Burchfield, “Alcohol and the Risk of Physical and Sexual Assault Victimization,” Criminology (42:4, 2004) Independent variables According to footnotes, odds of Vict while Drinking compared to Vict While Sober. Odds of Vict While Sober compared to Not Victimized. Effects on DV satisfaction with police measured under two conditions of control variable neigh. disadvantage (low/high) Dependent variable: satisfaction with police Yuning Wu, Ivan Y. Sun and Ruth A. Triplett, “Race, Class or Neighborhood Context: Which Matters More in Measuring Satisfaction With Police?,” Justice Quarterly (26:1, 2009) Independent variables
27
Asterisks are at the end of variable names
Sometimes probabilities are given in a dedicated column - there may be no “asterisks,” or they may be in an unusual place Probability that the null hypothesis is true / that the coefficient was generated by chance: * <.05 ** <.01 *** <.001 Asterisks are at the end of variable names Shelley Johnson Listwan, Christopher J. Sullivan, Robert Agnew, Francis T. Cullen and Mark Colvin, “The Pains of Imprisonment Revisited: The Impact of Strain on Inmate Recidivism,” Justice Quarterly (30:1, 2013)
28
And just when you thought you had it “down”…
It’s rare, but sometimes categories of the dependent variable run in rows, and the independent variable categories run in columns. Jodi Lane, Susan Turner, Terry Fain and Amber Sehgal, “Evaluating an Experimental Intensive Juvenile Probation Program: Supervision and Official Outcomes,” Crime & Delinquency (51:1, 2005) Hypothesis: SOCP (intensive supervision) fewer violations
29
INTERPRETIVE ISSUES
30
A caution on hypothesis testing…
Probability statistics are the most common way to evaluate relationships, but they are being criticized for suggesting misleading results. (Click here for a summary of the arguments.) We normally use p values to accept or reject null hypotheses. But the actual meaning is more subtle: Formally, a p <.05 means that, if an association between variables was tested an infinite number of times, a test statistic coefficient as large as the one actually obtained (say, an r of .3) would come up less than five times in a hundred if the null hypothesis of no relationship was actually true. For our purposes, as long as we keep in mind the inherent sloppiness of social science, and the difficulties of accurately quantifying social science phenomena, it’s sufficient to use p-values to accept or reject null hypotheses. We should always be skeptical of findings of “significance,” and particularly when very large samples are involved, as even weak relationships will tend to be statistically significant. (See next slide.)
31
Statistical significance v. size of the effect
Once we are confident that an effect was NOT caused by chance, we need to inspect its magnitude Consider this example from an article that investigated the “marriage effect” (N=8,984) Logistic regression was used to measure the association of disadvantage (coded 0/1) and the probability of arrest (Y/N) under four conditions (not important here) Bianca E. Bersani and Elaine Eggleston Doherty, “When the Ties That Bind Unwind: Examining the Enduring and Situational Processes of Change Behind the Marriage Effect,” Criminology (51:2, 2013) Without knowing more, it seems that the association between these two variables is confirmed in model 1 (p < .05) and model 4 (p < .001). But just how meaningful are these associations? Logistic regression was used, so we can calculate exp B’s. For model 1, the exp B is 1.08, meaning that “disadvantaged” persons are eight percent more likely to have been arrested than non-disadvantaged. That’s a tiny increase. For model 4 the exp B climbs to 38 percent (a little more than one-third more likely) Since standard error decreases as sample size increases, large samples have a well-known tendency to identify trivial relationships as “significant” Aside from exp B, R2 is another statistic that can help clue us in on just how meaningful relationships are “in the real world” Model 1 Model 2 Model 3 Model 4 b Sig SE Disadvantage (1/0) .078 * .037 .119 NS .071 .011 .107 .320 *** .091
32
Final exam practice
33
Sample question and answer on next slide
The final exam will ask the student to interpret a table. The hypothesis will be provided. Student will have to identify the dependent and independent variables Students must recognize whether relationships are positive or negative Students must recognize whether relationships are statistically significant, and if so, to what extent Students must be able to explain the effects described by log-odds ratios (exp b) using percentage Students must be able to recognize and interpret how the effects change: As one moves across models (different combinations of the independent variable) As one moves across different levels of the dependent variable For more information about reading tables please refer to the week 14 slide show and its many examples IMPORTANT: Tables must be interpreted strictly on the techniques learned in this course. Leave personal opinions behind. For example, if a relationship supports the notion that wealth causes crime, then wealth causes crime! Sample question and answer on next slide
34
Hypothesis: Unstructured socializing
and other factors youth violence In which model does Age have the greatest effect? Model 1 What is its numerical significance? .001 Use words to explain #2 Less than one chance in 1,000 that the relationship between age and violence is due to chance Use Odds Ratio (same as Exp b) to describe the percentage effect of Age on Violence in Model 1 For each year of age increase, violence is seventeen percent more likely What happens to Age as it moves from Model 2 to Model 3? What seems most responsible? Age becomes non-significant. Most likely cause is introduction of variable Deviant Peers.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.