Lecture 8 Chi-Square STAT 3120 Statistical Methods I.

Slides:



Advertisements
Similar presentations
Contingency Table Analysis Mary Whiteside, Ph.D..
Advertisements

Tutorial: Chi-Square Distribution Presented by: Nikki Natividad Course: BIOL Biostatistics.
Bivariate Analysis Cross-tabulation and chi-square.
1 Contingency Tables: Tests for independence and homogeneity (§10.5) How to test hypotheses of independence (association) and homogeneity (similarity)
Hypothesis Testing IV Chi Square.
Statistical Inference for Frequency Data Chapter 16.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Copyright ©2011 Brooks/Cole, Cengage Learning More about Inference for Categorical Variables Chapter 15 1.
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Categorical Variables Chapter 15.
Chi Square Test Dealing with categorical dependant variable.
Chi-square Test of Independence
Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Analyzing Data: Bivariate Relationships Chapter 7.
Presentation 12 Chi-Square test.
The Chi-Square Test Used when both outcome and exposure variables are binary (dichotomous) or even multichotomous Allows the researcher to calculate a.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Cross Tabulation and Chi-Square Testing. Cross-Tabulation While a frequency distribution describes one variable at a time, a cross-tabulation describes.
How Can We Test whether Categorical Variables are Independent?
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Aaker, Kumar, Day Ninth Edition Instructor’s Presentation Slides
CENTRE FOR INNOVATION, RESEARCH AND COMPETENCE IN THE LEARNING ECONOMY Session 2: Basic techniques for innovation data analysis. Part I: Statistical inferences.
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
1 Psych 5500/6500 Chi-Square (Part Two) Test for Association Fall, 2008.
HAWKES LEARNING SYSTEMS Students Matter. Success Counts. Copyright © 2013 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Section 10.7.
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 13: Nominal Variables: The Chi-Square and Binomial Distributions.
Chi-Square as a Statistical Test Chi-square test: an inferential statistics technique designed to test for significant relationships between two variables.
Dr.Shaikh Shaffi Ahamed Ph.D., Dept. of Family & Community Medicine
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Chapter 16 The Chi-Square Statistic
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Review of the Basic Logic of NHST Significance tests are used to accept or reject the null hypothesis. This is done by studying the sampling distribution.
1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Chapter-8 Chi-square test. Ⅰ The mathematical properties of chi-square distribution  Types of chi-square tests  Chi-square test  Chi-square distribution.
Lecture 8 Chi-Square STAT 3120 Statistical Methods I.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
Chi Square Classifying yourself as studious or not. YesNoTotal Are they significantly different? YesNoTotal Read ahead Yes.
Section 10.2 Independence. Section 10.2 Objectives Use a chi-square distribution to test whether two variables are independent Use a contingency table.
Lecture 3 Topic - Descriptive Procedures Programs 3-4 LSB 4:1-4.4; 4:9:4:11; 8:1-8:5; 5:1-5.2.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Describing Association for Discrete Variables. Discrete variables can have one of two different qualities: 1. ordered categories 2. non-ordered categories.
Chi-Square INCM Chi Square When presented with categorical data, one common method of analysis is the “Contingency Table” or “Cross Tab”. This is.
Other Types of t-tests Recapitulation Recapitulation 1. Still dealing with random samples. 2. However, they are partitioned into two subsamples. 3. Interest.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Bullied as a child? Are you tall or short? 6’ 4” 5’ 10” 4’ 2’ 4”
Cross Tabs and Chi-Squared Testing for a Relationship Between Nominal/Ordinal Variables.
Section 10.2 Objectives Use a contingency table to find expected frequencies Use a chi-square distribution to test whether two variables are independent.
LEARNING OUTCOMES 1.Know what descriptive statistics are and why they are used 2.Create and interpret tabulation tables 3.Use cross-tabulations to display.
Introduction to Marketing Research
Lecture #8 Thursday, September 15, 2016 Textbook: Section 4.4
INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE
Making Use of Associations Tests
Hypothesis Testing Review
Qualitative data – tests of association
Bivariate Testing (Chi Square)
The Chi-Square Distribution and Test for Independence
Bivariate Testing (Chi Square)
Chapter 10 Analyzing the Association Between Categorical Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Data Analysis Module: Chi Square
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Presentation transcript:

Lecture 8 Chi-Square STAT 3120 Statistical Methods I

STAT3120 – Chi Square Dependent Variable Independent (predictor) Variable Statistical Test Comments QuantitativeCategoricalT-TEST (one, two or paired sample) Determines if categorical variable (factor) affects dependent variable; typically used for experimental or planned change studies Quantitative Correlation /Regression Analysis Test establishes a regression model; used to explain, predict or control dependent variable Categorical Chi-SquareTests if variables are statistically independent (i.e. are they related or not?)

STAT3120 – Chi Square When presented with categorical data, one common method of analysis is the “Contingency Table” or “Cross Tab”. This is a great way to display frequencies - For example, lets say that a firm has the following data: 120 male and 80 female employees 40 males and 10 females have been promoted

STAT3120 – Chi Square Using this data, we could create the following 2x2 matrix: PromotedNot PromotedTotal Male Female Total

STAT3120 – Chi Square Now, a few questions… 1)From the data, what is the probability of being promoted? 2)Given that you are MALE, what is the probability of being promoted? 3)Given that you are promoted, what is the probability that you are MALE? 4)Given that you are FEMALE, what is the probability of being promoted? 5)Given that you are promoted, what is the probability that you are female?

STAT3120 – Chi Square The answers to these questions help us start to understand if promotion status and gender are related. Specifically, we could test this relationship using a Chi- Square. This is the test used to determine if two variables are related. The relevant hypothesis statements for a Chi-Square test are: H0: Variable 1 and Variable 2 are NOT Related Ha: Variable 1 and Variable 2 ARE Related Develop the appropriate hypothesis statements and testing matrix for the gender/promotion data.

STAT3120 – Chi Square The Chi-Square Test uses the Χ 2 test statistic, which has a distribution that is skewed to the right (it approaches normality as the number of obs increases). You can see an example of the distribution on pg 641. The Χ 2 test statistic calculation can be found on page 640. The observed counts are provided in the dataset. The expected counts are the counts which would be expected if there was NO relationship between the two variables.

STAT3120 – Chi Square PromotedNot PromotedTotal Male Female Total Going back to our example, the data provided is “observed”: What would the matrix look like if there was no relationship between promotion status and gender? The resulting matrix would be “expected”…

STAT3120 – Chi Square From the data, 25% of all employees were promoted. Therefore, if gender plays no role, then we should see 25% of the males promoted (75% not promoted) and 25% of the females promoted… PromotedNot PromotedTotal Male 120*.25 = 30120*.75 = Female 80*.25 = 2080*.75 = Total Notice that the marginal values did not change…only the interior values changed.

STAT3120 – Chi Square Now, calculate the X 2 statistic using the observed and the expected matrices: ((40-30) 2 /30)+((80-90) 2 /90)+((10-20) 2 /20)+((70- 60) 2 /60) = = This is conceptually equivalent to a t-statistic or a z-score.

To determine if this is in the rejection region, we must determine the df and then use the table on page 732. Df = (r-1)*(c-1)… In the current example, we have two rows and two columns. So the df = 1*1 = 1. At alpha =.05 and 1df, the critical value is 3.84…our value of is clearly in the reject region…so what does this mean? STAT3120 – Chi Square

From the book Outliers, Malcolm Glidewell makes the point that the month in which a boy is born will determine his probability of playing in the NHL. The months of birth for players in the NHL are on the next page… (data taken from ge=merron/081208)

January51 February46 March61 April49 May46 June49 July36 August41 September36 October34 November33 December30 STAT3120 – Chi Square Now, if there is NO relationship between birth month and playing hockey, what SHOULD the distribution of months look like? Lets do this one in EXCEL… Note that this is technically referred to as a “goodness of fit” test – where we are assessing if the actual distribution “fits” what would be expected.

STAT3120 – Chi Square Practice Problems for Chi-Square: For all of these, identify the hypothesis statements, the testing matrix, and the decision.

Categorical Example Using credit data.

Credit Sample Data Set –Purchase: $: 1=$250+, 0=<$250 –Age: Customer Age –Gender: male,female –Income: Low, Medium, High

What do we have? Predictors Gender Income Age Outcome GT $250 LT $250

Determine ‘Scale’ Nominal variables: –Values with no logical ordering. »Gender Ordinal variables: –Variables have values with a logical ordering. »Income

Lets Examine!? Determine distribution of categorical values Recognize possible associations among variables Association ? –Two variables when one level or value of the other changes. –No changes? Distribution of the variable is the same regardless of the level of the other variable

Determine Association No Association? –Statistic professor temperament changes with golf. Great golf Bad Golf Sunshine Raining 65%35% 65%35%

Watch Out! Association? –Statistic professor temperament changes with golf. Great golf Bad Golf Sunshine Raining 95%5% 30%70%

Crosstabulation Table Table shows the number of observations for each combination of the row and the column variables Column 1 Column 1 … Column 1 Row 1 Row 2 … Row r -Frequency: nbr of observations falling into a category formed by row variable and column variable -Percent: nbr of observations in each cell as a percentage of the total nbr of observations -Row percent: nbr of observations in each cell as a percentage of the total nbr of observations in that row -col percent: nbr of observations in each cell as a percentage of the total nbr of observations in that column Cell 11 Cell 12 …Cell 1c Cell 12 Cell 22 …Cell 2c ………… Cell r1 Cell r2 …Cell rc

Distributions SAS Freq procedure –Examine distributions –Ordering values

SAS Proc Freq Distributions libname JLLP 'E:\JenniferPriestly\Chi_Square'; %let outpath=E:\JenniferPriestly\Chi_Square; %let libpath=E:\JenniferPriestly\Chi_Square; options nodate nonumber ls=95 ps=80; run; Proc format; value purfmt 1 = "$ 100 +" 0 = "< $100" ; Run; ods graphics on; ods listing close; ods Rtf path="&outpath" style=journal file='freq.rtf'; proc freq data=JLLP.Online; tables purchase gender income gender*purchase income*purchase / plots(only)=(freqplot); format purchase purfmt.; run; ods select histogram probplot; proc univariate data=JLLP.Online; var age; histogram age / normal (mu=est sigma=est); probplot age / normal (mu=est sigma=est); run; ods rtf close; ods listing;

SAS Ordering Values Change Income ods graphics on; ods listing; data JLLP.Online_inc; set JLLP.Online; if income='Low' then IncLevel=1; else if income='Medium' then IncLevel=2; else if income='High' then IncLevel=3; run; proc format; value incfmt 1='Low Income' 2='Medium Income' 3='High Income'; run; ods graphics on; ods rtf path="&outpath" style=statistical file='freq2.rtf'; proc freq data=JLLP.Online_inc; tables IncLevel*Purchase; format IncLevel incfmt. Purchase purfmt.; title1 'Change Variable IncLevel to Correct Income'; run; ods rtf close;

Tests for Association Determine –Chi-square test for association –Examine strength of the association –Calculate exact p-value –Cramer’s V

Chi-Square Test ods graphics on; ods rtf path="&outpath" style=statistical file='freq3.rtf'; proc freq data=JLLP.Online_inc; tables Gender*purchase / chisq expected cellchi2 nocol nopercent relrisk; format purchase purfmt.; Title1 'Association Between Gender and Purchase'; run; ods rtf close;

Gender by Purchase Table of Gender by Purchase GenderPurchase Frequenc y Percent Row Pct Col Pct < $100 $ 100 +Total Female Male Total Table of Gender by Purchase GenderPurchase Frequenc y Percent Row Pct Col Pct < $100 $ 100 +Total Female Male Total

Chi-Square Test No association Observed frequencies=expected frequencies –Null Hypothesis: No association between Gender and Purchase Probability of purchasing items more than $100 is the same for both sexes. Association Observed frequencies≠expected frequencies –Alternative Hypothesis: There is an association between Gender and Purchase Probability of purchasing items more than $100 is the same for both sexes.

Pearson Chi-square Test Commonly used test to determine whether there is association between 2 categorical values Test measure the difference between the observed cell frequencies and the cell frequencies that are expected if there is no association between the variables Significant test statistic, strong evidence an association exists

Frequencies Calculation Expected frequencies are calculated by: »(row total * column total) / sample size No association between Row and Column variable the expected percentage in any R*C will be equal to the percentage in that cell rows (R/T) times the percentage in the cell column (C/T). The expected percentage times the total sample size. Expected count=(R/T)*(C/T)*T=(R*C)/T

Chi-square tests Measures of association –P-value tests only indicates how confident you can be that the null hypothesis if no association exists. –Cramer’s V statistics: measures association between two nominal variables. Range from -1 to 1 for a 2-by- 2 table. 0 to 1 for larger tables. Values further from 0 indicate the presence of a relativity strong association. –Odds Ratios indicates how much more likely, with the respect to odds a certain event occurs in one group relative to its occurrence in another group.

Odds Ratio

Probability of odds of an outcome NoYesTotal Group A Group B Total Prob of Yes outcome in Group B = 90/100 (.90) Prob of a No Outcome in Group B = 10/100 (.10)

Odds Ratio Odds of outcome in Group B ».90 /.10 = 9 Odds of outcome in Group A ».75 /.25 = 3 Odds Ratio of Group B to Group A »9 / 3 = 3 Odds ratio of Group B to Group A is 3 times.

Properties of the Odds Ratio, B to A Odds ratio shows strength of association. –If odds ration is 1 then there is no association –If odds ratio is greater than 1then Grp B is more likely to have the outcome. –If odds ratio is less than 1 then Grp A is more likely to have the outcome

Example Determine association between Gender and purchase. Generate expected cell frequencies and the cell’s contribution to the total chi-square statistic

Results Table of Gender by Purchase GenderPurchase Frequency Expected Cell Chi-Square Row Pct < $100$ 100 +Total Female Male Total Calculate cell Chi-square

Results StatisticDFValueProb Chi-Square Likelihood Ratio Chi-Square Continuity Adj. Chi-Square Mantel-Haenszel Chi-Square Phi Coefficient Contingency Coefficient Cramer's V Fisher's Exact Test Cell (1,1) Frequency (F)139 Left-sided Pr <= F Right-sided Pr >= F Table Probability (P) Two-sided Pr <= P Estimates of the Relative Risk (Row1/Row2) Type of StudyValue95% Confidence Limits Case-Control (Odds Ratio) Cohort (Col1 Risk) Cohort (Col2 Risk) P-value is <.05, reject the Null hypothesis Appendix A.5:.05<p-value<.025 Cramer’s V indicates association is relatively weak. Relative Risk at 95% CI that Males in the right column (+100) compared to Females has value of Males has a 65% odds of purchasing more then $100 Odds ratio (OR-1)*100, ( )*100=-35.42%, males have a 35.42% lower odds than females.

Gender by Purchase