CATEGORICAL VARIABLES

Slides:

Advertisements

Similar presentations

DEPICTING DISTRIBUTIONS. How many at each value/score Value or score of variable.

Advertisements

Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.

Research Designs. REVIEW Review -- research General types of research – Descriptive (“what”) – Exploratory (find out enough to ask “why”) – Explanatory.

Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.

1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.

TYPES OF RESEARCH. Descriptive research Using data to describe situations and trends.

CATEGORICAL VARIABLES Testing hypotheses using. Independent variable: Income, measured categorically (nominal variable) – Two values: low income and high.

DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.

Chapter 10: Relationships Between Two Variables: CrossTabulation

Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.

Difference Between Means Test (“t” statistic) Analysis of Variance (“F” statistic)

CATEGORICAL VARIABLES Testing hypotheses using. When only one variable is being measured, we can display it. But we can’t answer why does this variable.

TYPES OF RESEARCH. Descriptive research Violent crime has been falling since the early 1990’s. Imprisonment is still increasing, but at a slower rate.

Testing hypotheses Continuous variables. H H H H H L H L L L L L H H L H L H H L High Murder Low Murder Low Income 31 High Income 24 High Murder Low Murder.

Chapter 10: Cross-Tabulation Relationships Between Variables  Independent and Dependent Variables  Constructing a Bivariate Table  Computing Percentages.

Variables, measurement and causation. Variable Any personal or physical characteristic that... –Can change –The change must be measurable Examples of.

Inferential Statistics Introduction. If both variables are categorical, build tables... Convention: Each value of the independent (causal) variable has.

Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”

DISTRIBUTIONS. What is a “distribution”? One distribution for a continuous variable. Each youth homicide is a case. There is one variable: the number.

BIVARIATE/MULTIVARIATE DESCRIPTIVE STATISTICS Displaying and analyzing the relationship between categorical variables.

Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”

Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.

Research Designs. REVIEW Review -- research General types of research – Descriptive (“what”) – Exploratory (find out enough to ask “why”) – Explanatory.

Difference Between Means Test (“t” statistic) Analysis of Variance (F statistic)

Chapter 6 – 1 Relationships Between Two Variables: Cross-Tabulation Independent and Dependent Variables Constructing a Bivariate Table Computing Percentages.

TYPES OF RESEARCH. Descriptive research Using data to describe situations and trends.

SAMPLING. Basic concepts Why not measure everything? – Practical reason: Measuring every member of a population is too expensive or impractical – Mathematical.

DEFINITIONS Population Sample Unit of analysis Case Sampling frame.

Other tests of significance. Independent variables: continuous Dependent variable: continuous Correlation: Relationship between variables Regression:

SAMPLING Purposes Representativeness “Sampling error”

DEFINITIONS Population Sample Unit of analysis Case Sampling frame.

Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4

1. Research & the Role of Statistics 2

DESCRIPTIVE STATISTICS

1. Research & the Role of Statistics 2

INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE

Sampling Population: The overall group to which the research findings are intended to apply Sampling frame: A list that contains every “element” or.

SAMPLING Purposes Representativeness “Sampling error”

Introduction to Inferential Statistics

Bi-variate #1 Cross-Tabulation

CHAPTER 11 Inference for Distributions of Categorical Data

Inferential statistics,

Difference Between Means Test (“t” statistic)

Inferential Statistics

Research Designs.

Parking lot exercise Research question: Does income determine what cars people drive? Literature review: Vehicle registration info., census data, car prices,

AP Statistics Chapter 3 Part 2

Types of research Descriptive Explanatory.

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 10 Analyzing the Association Between Categorical Variables

The Table Categorization

CATEGORICAL VARIABLES

Testing hypotheses Continuous variables.

15.1 Goodness-of-Fit Tests

DEFINITIONS Population Sample Unit of analysis Case Sampling frame.

DEFINITIONS Population Sample Unit of analysis Case Sampling frame.

VARIABILITY Distributions Measuring dispersion

Contingency Tables.

CHAPTER 11 Inference for Distributions of Categorical Data

VARIABILITY Distributions Measuring dispersion

CHAPTER 11 Inference for Distributions of Categorical Data

Testing hypotheses Continuous variables.

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

1. Research & the Role of Statistics 2

Presentation transcript:

CATEGORICAL VARIABLES Testing hypotheses using CATEGORICAL VARIABLES

Assessing relationships between categorical variables Hypothesis: formal statement of cause and effect, derived from research question and literature review Better demeanor  more leniency To test a relationship between categorical variables, we build TWO tables Frequencies table Dependent variable values in COLUMNS Independent variable values in ROWS Frequencies (no. of cases with corresponding values) in the cells Percentages table Calculated row by row, top to bottom ROW percentages - each row totals 100% This is the only acceptable way to build tables in this class. Must do it this way to earn credit on the exams. Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest n Cooperative 24 15 4 2 45 Uncooperative 1 5 14 21 Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest Cooperative 53% 33% 9% 5% 100% Uncooperative 23% 67%

Recoding continuous to categorical variables Hypothesis: higher income persons drive more expensive cars Independent variable: Income, measured categorically (nominal variable) Two values: low income and high income Income is measured by where a car is parked - student lot (low income) and faculty-staff lot (high income) Dependent variable: Car value, measured categorically (ordinal variable) Low, Medium, High Sampling Stratified, disproportionate, systematic sampling of 10 cars from a student lot, and 10 cars from a faculty lot Coding Income is automatically coded by car location (faculty-staff or student lot) Car values must be recoded from original five-point scale to three categories - Low, Medium, High

Step 1: Recoding Step 2: Percentaging Recode continuous 1-5 car value measure into three categories, L/M/H 1-2: LOW 3: MED 4-5: HIGH DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 Step 2: Percentaging Convert frequencies into percentages Convert each row separately so the cells add to 100 percent DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 100% HIGH (F/S lot) 40% 60%

Draw the distributions and examine their shapes DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 100% HIGH (F/S lot) 40% 60% Step 3: Distributions Draw the distributions and examine their shapes Student lot L Faculty lot H L Step 4: Analysis Switch values of the independent variable. Does the distribution of cases substantially change in the predicted direction? All the cars in the student lot are low value. Sixty percent of the cars in the F/S lot are high value. As we “switch” values of the IV, the distribution changes in the hypothesized direction, with larger proportions of more expensive cars in the F/S lot. Hypothesis confirmed!

An example where the hypothesis is NOT confirmed Student lot H L Faculty lot L M H DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 30% 20% 50% 100% HIGH (F/S lot) 40% 60% Differences between rows are relatively slight For both income levels, distribution is biased to higher car values Is there a more accurate way to assess possible differences? You bet! That’s where we’re heading…

Hypothesis: poverty  crime Remember: if there appears to be an effect, is it in the hypothesized direction? Hypothesis: poverty  crime IV Poverty is measured by income, DV crime by arrests Income has two values, low and high Arrests has two values, never arrested and arrest record To test the hypothesis, switch from one category of the IV to the other. Does the distribution of cases along the DV change? Is the change substantial? Is the change in the hypothesized direction? Never Arrested Arrest Record Low Income 50% 100% High Never Arrested Arrest Record Low Income 20% 80% 100% High Never Arrested Arrest Record Low Income 80% 20% 100% High Distribution is unaffected. There seems to be no connection between income and arrest record. The hypothesis is rejected. Distribution flips in the expected direction. High income persons seem much less likely to have an arrest record. The hypothesis is confirmed. Distribution flips in the opposite direction. High income persons seem much more likely to have an arrest record. The hypothesis is rejected.

Elaboration analysis Zero-order table First-order partial tables Replication Explanation Specification

Could another IV be at play? Remember our poverty  crime example? Intervening variable: Could lack of education or living in a violent area be the more proximate (closer) cause of crime? Poverty  poor education  crime Here poverty still affects crime, but mostly through intervening variable education, which is the more proximate cause Spurious relationship: What seems to be a relationship isn’t - it’s bogus! Often caused by a strong association between the independent variable of interest (e.g., poverty) and another independent variable (e.g., poor social controls) which turn out to be the real cause Poor social controls  crime Poverty X

PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism Using “elaboration analysis” to determine if another IV might be at play PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism Sample of 100 officers and 100 supervisors Twenty officers scored low on cynicism; 80 were high cynicism Fifty supervisors scored low on cynicism; 50 were high cynicism Build a frequency table, then convert it to percentages Be sure to place the categories of the dependent variable in columns, and the categories of the independent variable in rows

PRACTICAL EXERCISE Hypothesis: Higher rank  Less cynicism Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% Well, that was easy! Looks like the hypothesis (higher rank, less cynicism) is confirmed Of course, we can’t stop here. There are many variables floating around. Are there any other variables that are related to our independent variable, rank, and which could possibly affect dependent variable cynicism? A literature review reveals that gender is associated with rank, and that gender may also affect cynicism. Gender could be an “intervening” variable, mediating the relationship with cynicism… Rank  Gender  Cynicism Or, it could be the real cause of changes in cynicism If it was, the apparent (but non-existing) relationship between rank and cynicism would be called “spurious” How do we disentangle the possibilities? Coming up!

Hypothesis: Higher rank  Less cynicism Elaboration analysis - using first-order partial tables to analyze the effects of a “control” variable Let’s “elaborate” (dig deeper). Does the effect of rank on cynicism hold regardless of gender? Gender is used as a “control” variable. We will test the original, “zero-order” relationship between rank and cynicism, “controlling” for each value of gender. Gender is categorical, so we keep using tables Create two tables - one with frequencies, the other with percentages - for each value of control variable gender Tables are just like the original, but each pair only includes cops of one gender (M or F) These tables are called “first order partial tables” because… “First order”: It’s our first “control” variable “Partial”: Each table only includes cases at one value of the control variable gender (Male or Female) To distinguish, the original tables are called “zero-order” Data for gender MALES: Officers: 10 low cynicism, 50 high Supervisors: 35 low, 35 high FEMALES: Officers: 10 low, 30 high Supervisors: 15 low, 15 high Analyze the percentage tables - what do you find? Hypothesis: Higher rank  Less cynicism “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High n Officers 20% 80% 100% Supervisors 50%

First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 10 50 60 Supervisors 35 70 130 Cynicism Rank Low High n Officers 10 30 40 Supervisors 15 70 Males Females Cynicism Rank Low High Officers 17% 83% 100% Supervisors 50% Cynicism Rank Low High Officers 25% 75% 100% Supervisors 50% Both levels of control variable gender - male and female - yield about the same findings as the zero-order table. Knowing an officer’s gender changes nothing. We have replicated the zero-order findings. Gender is not a factor in rank  cynicism. Our hypothesis is confirmed. That’s called: REPLICATION

Hypothesis: Higher rank  Less cynicism But we’re not done yet! Our literature review suggested that another variable associated with rank – time on the job – may also affect cynicism. Let’s “control” for time on the job. Here’s the data: LESS THAN FIVE YEARS ON THE JOB Officers: 0 low cynicism, 75 high cynicism Supervisors: 2 low cynicism, 40 high cynicism MORE THAN FIVE YEARS ON THE JOB Officers: 20 low, 5 high Supervisors: 48 low, 10 high Create first-order partial tables for time on the job, convert tables to percentages, and analyze the results...

First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 75 Supervisors 2 40 42 117 Cynicism Rank Low High n Officers 20 5 25 Supervisors 48 10 58 83 < 5 years 5+ years Cynicism Rank Low High Officers 0% 100% Supervisors 5% 95% Cynicism Rank Low High Officers 80% 20% 100% Supervisors 83% 17% Both levels of control variable time on the job are strongly associated with cynicism. This completely “explains” (wipes away) the apparent relationship between rank and cynicism. Our hypothesis (higher rank  less cynicism) is rejected. It’s less time on the job  more cynicism. This outcome is called: EXPLANATION

Hypothesis: Higher rank  Less cynicism But we’re not done yet! Our literature review suggested that another variable associated with rank – education – may also affect cynicism. Let’s “control” for education. Here’s the data: HIGH SCHOOL Officers: 10 low cynicism, 50 high cynicism Supervisors: 9 low cynicism, 11 high cynicism COLLEGE Officers: 30 low, 10 high Supervisors: 65 low, 15 high Create first-order partial tables for educational level, convert tables to percentages, and analyze the results...

First-order partial tables Original “zero-order” tables Cynicism Rank Low High n Officers 20 80 100 Supervisors 50 200 Cynicism Rank Low High Officers 20% 80% 100% Supervisors 50% First-order partial tables Cynicism Rank Low High n Officers 10 50 60 Supervisors 9 11 20 80 Cynicism Rank Low High n Officers 30 10 40 Supervisors 65 15 80 120 High school College Cynicism Rank Low High Officers 17% 83% 100% Supervisors 45% 55% Cynicism Rank Low High Officers 75% 25% 100% Supervisors 81% 19% For education level, our findings are split. The cynicism of high school grads resembles that of the zero-order table. But college-educated cops are much less cynical. One value of the control variable - education - teaches us something new. This outcome is called: SPECIFICATION

Review: first-order partial analysis has three possible outcomes First-order (and second-order, and so on) partial analyses yield three possible outcomes: Replication (first example): The original, “zero-order” relationship persists at each value of the control variable. Coding for this variable teaches us nothing. The hypothesis is confirmed. Explanation (second example): The original, “zero-order” relationship is absent from each value of the control variable. The effect of the IV on the DV in the zero-order table has been “explained away” by the control variable. The hypothesis is rejected. Specification (final example): The zero-order relationship persists for some but not all values of the control variable. Coding for this variable teaches us something. The hypothesis must be modified.

But isn’t our analysis too “loosey-goosey”? Assume there is a relationship between variables. When we “switch” the value of the IV, will the change in the DV always be obvious? No. And when the DV has multiple categories, such as in our parking lot exercise, visually discerning an effect can be difficult or impossible. Bottom line - changes in percentage are not enough. Great. Now what? We can use the original frequencies table to build a second table, which projects what the frequencies would be if there was absolutely NO relationship between variables. We then compare the two tables with a “test statistic” (statistic used to test hypotheses) known as “Chi-square”, X2. If this statistic is of sufficient magnitude (large enough), the tables are deemed “significantly” different, meaning that there IS a relationship between variables. More on this during the third part of the semester! Cynicism - Males Rank Low High Officers 17% 83% 100% Supervisors 50% Cynicism - Males Rank Low High n Officers 10 50 60 Supervisors 35 70 130

Dependent variable: arrest (Y/N) Where we headed? Dependent variable: arrest (Y/N) Hypothesis testing requires that we assess the influence, on the DV, of your IV of interest, and of other variables that may be related to your IV and which could have their own effect on the DV. Tables in your articles parcel out the effects of all variables on the DV. A “test statistic” informs you exactly how much influence on the DV is exerted by your IV of interest and by its cousins. We’ll be learning about various test statistics during the semester Dependent variable: death sentence (Y/N)

REMINDER: The ONLY acceptable way to build a table in this class Dependent variable values in COLUMNS Independent variable values in ROWS Frequencies (number of cases with corresponding values) in the cells Two tables One table contains the frequencies A second table, calculated from the first, contains ROW percentages This is the only acceptable way to build tables in this class. Must do it this way to earn credit on the exams. Better demeanor  More lenient disposition Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest n Cooperative 24 15 4 2 45 Uncooperative 1 5 14 21 Officer’s Disposition Youth’s demeanor Admonish Reprimand Citation Arrest Cooperative 53% 33% 9% 5% 100% Uncooperative 23% 67%