Download presentation
Presentation is loading. Please wait.
1
Chi-Square X2
2
Review: the “null” hypothesis
Null hypothesis is true Reject null hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis” applies Null hypothesis: There is no relationship between variables. Any apparent effect was produced by chance To reject the null, the test statistic (e.g., R2, t, b, X2, etc.) must be so large that the probability the null is true is less than five in one-hundred (< .05) How do we know if the null is true? Compare the test statistic to a table “Probability” or p means the chance that the null hypothesis is true In a study, look for asterisks in the statistic’s column. If there is no asterisk, the null for that relationship is true. Usually one asterisk (*) means the probability the null is true is less than 5 in 100 (p <.05). Two asterisks (**) is better (p <.01, probability the null is true is less than one in 100). Three (***) is great (p <.001, probability less than one in 1,000.)
3
Test statistics Independent and dependent variables are continuous
Regression (r2 and R2) b statistic - interpreted as unit change in the DV for each unit change in the IV Independent variables are nominal or continuous; dependent variable is nominal Logistic regression, generates “b” and exp(b) (a.k.a. odds ratio) Independent and dependent variables are categorical Chi-Square (X2) Categorical dependent and continuous independent variables Difference between the means test (t statistic) Procedure Level of Measurement Statistic Interpretation Regression All variables continuous r2, R2 b Proportion of change in the dependent variable accounted for by change in the independent variable. Unit change in the dependent variable caused by a one-unit change in the independent variable Logistic regression DV nominal & dichotomous, IV’s nominal or continuous exp(B) (odds ratio) Don’t try - it’s on a logarithmic scale Odds that DV will change if IV changes one unit, or, if IV is dichotomous, if it changes its state. Chi-Square All variables categorical (nominal or ordinal) X2 Reflects difference between Observed and Expected frequencies. Difference between means IV dichotomous, DV continuous t Reflects magnitude of difference. Use table to determine if coefficient is sufficiently large to reject null hypothesis.
4
Chi-Square (X 2) Null hypothesis is true Reject null hypothesis A test statistic, tests relationship between two categorical variables (nominal or ordinal) Yields a coefficient that can be looked up in a table The larger the coefficient, the less the probability that the null hypothesis is correct Chi-Square evaluates the difference between Observed and Expected cell frequencies: “Observed” means the actual data “Expected” means what we would get if there was no relationship between the variables (we create this table) If no difference between observed and expected frequencies, 2 is zero and the null hypothesis is true Greater the difference, the larger the value of 2, thus the smaller the probability that the null hypothesis is true REMEMBER: We always place the values of the IV in rows, and of the DV in columns Observed (actual) frequencies DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 Expected frequencies (if car value & income are not related) DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 7 3 10 HIGH (F/S lot)
5
BUILDING & INTERPRETING TABLES
Observed table Expected table Computing Chi-Square (X2) Assessing significance of Chi-Square (X2)
6
Class exercise - building the “observed” table
Research question: Does gender affect how cases are treated by the CJ system? Hypothesis: Gender Disposition (1 tailed would be males more severe treatment) Sample of 100 males and 50 females arrested for shoplifting 84 males went to jail; 16 were cited and released 30 females went to jail; 20 were cited and released YOUR ASSIGNMENT Identify the independent and dependent variables Build a frequency table for the “observed” (actual) data. (Percentage table not needed for Chi-Square.) Be sure to place the values of the IV in rows, and the values of the DV in columns Fill in the cells with the frequencies
7
Building the “expected” table
Hypothesis: Gender Disposition “Observed” table - the actual data Create a new table from scratch Disposition Gender Jail Released Total Male 84 16 100 Female 30 20 50 114 36 n = 150 “Expected” table -“expected” frequencies if the null hypothesis of no relationship between variables is true 1. Bring over the “marginals” - all the totals Disposition Gender Jail Released Total Male 100 Female 50 114 36 n = 150 Male/Jail: Male/Released: Female/Jail: Female/Released: Divide the cells’ row total by the grand total, then multiply by the column total 2. Fill in each cell, one at a time
8
Building the “expected” table
Hypothesis: Gender Disposition “Observed” table - the actual data Create a new table from scratch Disposition (observed) Gender Jail Released Total Male 84 16 100 Female 30 20 50 114 36 n = 150 “Expected” table -“expected” frequencies if the null hypothesis of no relationship between variables is true 1. Bring over the “marginals” - all the totals Disposition (expected) Gender Jail Released Total Male 76 24 100 Female 38 12 50 114 36 n = 150 Male/Jail: 100/150 X 114=75.9=76 Male/Released: 100/150 X 36=23.9=24 Female/Jail: 50/150 X 114=37.9=38 Female/Released: 50/150 X 36=11.9=12 Divide the cells’ row total by the grand total, then multiply by the column total 2. Fill in each cell, one at a time
9
Demonstrating the meaning of “expected”
Percent tables (for show only - not used in Chi-Square) Observed frequencies table Disposition Gender Jail Released Total Male 84 16 100 Female 30 20 50 114 36 150 Disposition Gender Jail Released Total Male 84% 16% 100% Female 60% 40% This observed table reveals a moderately strong relationship between gender and disposition Expected frequencies table Disposition Gender Jail Released Total Male 76 24 100 Female 38 12 50 114 36 150 Disposition Gender Jail Released Total Male 76% 24% 100% Female In an expected table there is NO relationship between the variables. It’s the null hypothesis!
10
Comparing the observed and expected tables: the meaning of Chi-Square (X 2)
The observed table is the data, as we find it The expected table is purposely built to demonstrate no relationship between variables. It is the null hypothesis. To determine whether the observed table demonstrates a relationship between variables, we compare its cell frequencies to those in the “expected” table The less similar the tables, the more likely that the working hypothesis is true, and the less likely that the null hypothesis is true 2 is a ratio that reports the dissimilarity between observed and expected frequencies The more dissimilar, the larger the 2 O= observed (actual) frequency E= expected frequency (if null hypothesis is true) More formally, 2 is the ratio of systematic variation to chance variation The larger the ratio, the more likely that we can reject the null hypothesis. Chi-square is not always a good measure because its accuracy depends on sample size. Over-estimate significance with large samples, under-estimate with small samples Ideal sample size is around 150, with no cells less than 5
11
Computing X2 Always pair up the corresponding cells and divide by the expected frequency Observed frequencies Disposition Gender Jail Released Total Male 84 16 100 Female 30 20 50 114 36 n = 150 Expected frequencies Disposition Gender Jail Released Total Male 76 24 100 Female 38 12 50 114 36 n = 150 (O - E) (84-76) (16-24) (30-38) (20-12)2 2 = = = 10.5 E
12
Assessing the significance of X2
To reject the null hypothesis a test statistic, such as 2, must be of sufficient magnitude. The larger the better! df = rows minus 1 X columns minus 1 (r-1 X c-1)=(2 – 1) X (2 – 1)=1 In social science research we reject the null hypothesis when there are fewer than five chances in 100 (p=<.05) that it is true. Our chi-square is larger than what we need: there is less than one chance in one-hundred (p=<.01) that the null is true. Our observed data has proven so different from what would be expected if there was no relationship between variables that we can reject the null hypothesis of no relationship. We thus confirm the working hypothesis that gender affects disposition. There is less than one chance in a thousand that we’re wrong! 2 =10.5 Null hypothesis is true Reject null hypothesis
13
Class exercise Hypothesis: More building alarms Less crime
Randomly sampled 120 businesses with alarms 50 had crimes, 70 didn’t Randomly sampled 90 businesses without alarms 50 had crimes, 40 didn’t Build the observed and expected tables Remember, they’re tables, so place the values of the independent variable in rows Compute (O - E) 2 = E Use the table to assess the probability that the null hypothesis is correct df= r-1 X c-1 Convey your findings using simple words. What does the data show about building alarms and crime? How certain are you of your conclusions?
14
Observed (obtained) frequencies
Crime Alarm Y N Total 50 70 120 40 90 100 110 210 Expected (by chance) frequencies Crime Alarm Y N Total 120 90 100 110 210 57 63 43 47 (O - E) (50-57) (70-63) (50-43) (40-47)2 2 = = = 3.82 E
15
2 = 3.82 df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1 To reject the null hypothesis at .05 level we need a 2 of or greater Our chi-square is smaller, making the probability that the null hypothesis is true greater than the max of five in one-hundred (defaults to next lower level, .10, or ten chances in one-hundred that the null hypothesis is true) So we must accept the null hypothesis – there is NO significant relationship between crime and alarms
16
Parking lot exercise Graph the distribution of car values for each parking lot Student lot Car value Faculty lot Prepare the frequency and percentage tables DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 DV - Car value IV - Income LOW MED HIGH % LOW (student lot) 10% 100 HIGH (F/S lot) 40% 60% Is there a relationship? Is it in the hypothesized direction?
17
X Frequencies observed Frequencies expected
Use the frequency table to create an “frequencies expected” table (expected if the null hypothesis of no relationship is correct) Frequencies observed DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 14 20 10 20 X 14 = 7 Row marginal Total cases X column marginal DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 7 3 10 HIGH (F/S lot) 14 6 20 Frequencies expected
18
Frequencies observed Frequencies expected
Compute X 2: Cell by corresponding cell, subtract EXPECTED from OBSERVED. Square each difference. Divide each result by EXPECTED. Then total them up. Frequencies observed Frequencies expected DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 10 HIGH (F/S lot) 4 6 14 20 DV - Car value IV - Income LOW MED HIGH n LOW (student lot) 7 3 10 HIGH (F/S lot) 14 6 20 (O - E) (10-7) (0-0)2 (0-3) (4-7) (0-0)2 (6-3)2 2 = = = 8.58 E df = r-1 X c-1 = (2 – 1) X (3 – 1) = 1 x 2 = 2
19
Check the table. Is the Chi-square large enough to reject the null hypothesis at the .05 level? If no, the null hypothesis is true. If it is large enough, is it so large that there is an even lower probability that the null hypothesis is true? df = r-1 X c-1 = (2 – 1) X (3 – 1) = 2 2 =8.58 Null hypothesis is true Reject null hypothesis The greatest risk we can take that the null hypothesis is true is five in one-hundred (.05) Our Chi-square, 8.58, is greater than 5.991, the required minimum (it’s between columns, so we fall back to the .02 column) We can thus reject the NULL hypothesis and accept the WORKING hypothesis that higher income persons drive more expensive cars, with only TWO chances in 100 of being wrong. Larger Chi-squares could have reduced the risk that the null hypothesis is true to two in one-in one-hundred (.01), or even one in one-thousand (.001)
20
Homework
21
Hypothesis: Sergeants have more stress than patrol officers
Homework exercise Hypothesis: Sergeants have more stress than patrol officers 1. Calculate expected cell frequencies (null hypothesis of no relationship is true) 2. Compute Chi-square 3. Use table in Appendix E to determine your chi-square’s probability level 4. Can we reject the null hypothesis?
22
Homework answer Observed Expected (30-52)2 (60-38)2 (86-64)2 (24-46)2
(30-52)2 (60-38) (86-64)2 (24-46)2 2 = =
23
2 = 40.1 df = r-1 X c-1 = (2 – 1) X (2 – 1) = 1 To reject at .05 level need 2 = or greater Reject null hypothesis – Less than 1 chance in 1,000 that relationship is due to chance
24
Practice for the final
25
You will test a hypothesis using two categorical variables and determine whether the independent variable has a statistically significant effect. You will be asked to state the null hypothesis. You will used supplied data to create an Observed frequencies table. You will use it to create an Expected frequencies table. You will be given a formula but should know the procedure. You will compute the Chi-Square statistic and degrees of freedom. You will be given formulas but should know the procedures by heart. You will use the Chi-Square table to determine whether the results support the working hypothesis. Print and bring to class: Sample question: Hypothesis is that alarm systems prevent burglary. Random sample of 120 business with an alarm system and 90 without. Fifty businesses of each kind were burglarized. Null hypothesis: No significant difference in crime between businesses with and without alarms Observed frequencies Expected frequencies
26
Observed frequencies Expected frequencies
(50-57) (70-63) (50-43) (40-47)2 _ = = 3.82 Chi-Square = 3.82 Df = (r-1) X (c-1) = 1 Check the table. Do the results support the working hypothesis? No - Chi-Square must be at least 3.84 to reject the null hypothesis of no relationship between alarm systems and crime, with less than five chances in 100 that the null hypothesis is in fact true. Here the Chi-Square is slightly too small!
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.