Data Analysis Module: Chi Square

Data Analysis Module: Chi Square
R Programming Data Analysis Module: Chi Square

Data Analysis Module Basic Descriptive Statistics and Confidence Intervals Basic Visualizations Histograms Pie Charts Bar Charts Scatterplots Ttests/Bivariate testing One Sample Paired Independent Two Sample ANOVA Chi Square and Odds Regression Basics

When presented with categorical data, one common method of analysis is the “Contingency Table” or “Cross Tab”. This is a great way to display frequencies - For example, lets say that a firm has the following data: 120 male and 80 female employees 40 males and 10 females have been promoted

Using this data, we could create the following 2x2 matrix: Promoted Not Promoted Total Male 40 80 120 Female 10 70 50 150 200

Now, a few questions… From the data, what is the probability of being promoted? Given that you are MALE, what is the probability of being promoted? Given that you are promoted, what is the probability that you are MALE? Given that you are FEMALE, what is the probability of being promoted? Given that you are promoted, what is the probability that you are female?

The answers to these questions help us start to understand if promotion status and gender are related. Specifically, we could test this relationship using a Chi-Square. This is the test used to determine if two variables are related. The relevant hypothesis statements for a Chi-Square test are: H0: Variable 1 and Variable 2 are NOT Related Ha: Variable 1 and Variable 2 ARE Related Develop the appropriate hypothesis statements and testing matrix for the gender/promotion data.

The Chi-Square Test uses the Χ2 test statistic, which has a distribution that is skewed to the right (it approaches normality as the number of obs increases). The observed counts are provided in the dataset. The expected counts are the counts which would be expected if there was NO relationship between the two variables.

Going back to our example, the data provided is “observed”: Promoted Not Promoted Total Male 40 80 120 Female 10 70 50 150 200 What would the matrix look like if there was no relationship between promotion status and gender? The resulting matrix would be “expected”…

From the data, 25% of all employees were promoted. Therefore, if gender plays no role, then we should see 25% of the males promoted (75% not promoted) and 25% of the females promoted… Promoted Not Promoted Total Male 120*.25 = 30 120*.75 = 90 120 Female 80*.25 = 20 80*.75 = 60 80 50 150 200 Notice that the marginal values did not change…only the interior values changed.

Now, calculate the X2 statistic using the observed and the expected matrices: ((40-30)2/30)+((80-90)2/90)+((10-20)2/20)+((70-60)2/60) = = 11.11 This is conceptually equivalent to a t-statistic or a z-score.

To determine if this is in the rejection region, we must determine the df. Df = (r-1)*(c-1)… In the current example, we have two rows and two columns. So the df = 1*1 = 1. At alpha = .05 and 1df, the critical value is 3.84…our value of is clearly in the reject region…so what does this mean?

#here, the code is pretty simple…first install the “prettyR” package. Then, you can run an xtab: Xtab(var1~var2, data=data) Then a Chi Squared test: chisq.test(var1, var2, correct=FALSE)

Data Analysis Module: Chi Square

Similar presentations

Presentation on theme: "Data Analysis Module: Chi Square"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Analysis Module: Chi Square

Similar presentations

Presentation on theme: "Data Analysis Module: Chi Square"— Presentation transcript:

Similar presentations

About project

Feedback