Chi-Square and odds ratios

Slides:



Advertisements
Similar presentations
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Advertisements

Bivariate Analysis Cross-tabulation and chi-square.
INTRODUCTION TO NON-PARAMETRIC ANALYSES CHI SQUARE ANALYSIS.
ANOVA: Analysis of Variance
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Statistical Tests Karen H. Hagglund, M.S.
ANOVA: PART II. Last week  Introduced to a new test:  One-Way ANOVA  ANOVA’s are used to minimize family-wise error:  If the ANOVA is statistically.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #17.
PY 427 Statistics 1Fall 2006 Kin Ching Kong, Ph.D Lecture 12 Chicago School of Professional Psychology.
Chi-square Test of Independence
Data Analysis Statistics. Inferential statistics.
Today Concepts underlying inferential statistics
Summary of Quantitative Analysis Neuman and Robson Ch. 11
The Research Skills exam: The four horsemen of the apocalypse: pestilence, war, famine and the RS1 exam.
Crosstabs. When to Use Crosstabs as a Bivariate Data Analysis Technique For examining the relationship of two CATEGORIC variables  For example, do men.
Leedy and Ormrod Ch. 11 Gray Ch. 14
Practical statistics for Neuroscience miniprojects Steven Kiddle Slides & data :
Statistics for the Social Sciences Psychology 340 Fall 2013 Tuesday, November 19 Chi-Squared Test of Independence.
Statistics for the Social Sciences Psychology 340 Fall 2013 Thursday, November 21 Review for Exam #4.
Estimation and Hypothesis Testing Faculty of Information Technology King Mongkut’s University of Technology North Bangkok 1.
Inferential Statistics: SPSS
Bivariate Relationships Analyzing two variables at a time, usually the Independent & Dependent Variables Like one variable at a time, this can be done.
GROUP DIFFERENCES: THE SEQUEL. Last time  Last week we introduced a few new concepts and one new statistical test:  Testing for group differences 
Power and Effect Size.
Week 10 Chapter 10 - Hypothesis Testing III : The Analysis of Variance
Statistics Definition Methods of organizing and analyzing quantitative data Types Descriptive statistics –Central tendency, variability, etc. Inferential.
Chi-Square Test of Independence Practice Problem – 1
MULTIPLE REGRESSION Using more than one variable to predict another.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Chapter 9: Non-parametric Tests n Parametric vs Non-parametric n Chi-Square –1 way –2 way.
ANOVA: PART I. Quick check for clarity  Variable 1  Sex: Male vs Female  Variable 2  Class: Freshman vs Sophomore vs Junior vs Senior  How many levels.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
 Slide 1 Two-Way Independent ANOVA (GLM 3) Chapter 13.
Chi-Square X 2. Parking lot exercise Graph the distribution of car values for each parking lot Fill in the frequency and percentage tables.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CHI SQUARE TESTS.
Chi-square Test of Independence
Experimental Research Methods in Language Learning Chapter 10 Inferential Statistics.
Review. Statistics Types Descriptive – describe the data, create a picture of the data Mean – average of all scores Mode – score that appears the most.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Introduction to Basic Statistical Tools for Research OCED 5443 Interpreting Research in OCED Dr. Ausburn OCED 5443 Interpreting Research in OCED Dr. Ausburn.
Chapter Eight: Using Statistics to Answer Questions.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Non-parametric Tests e.g., Chi-Square. When to use various statistics n Parametric n Interval or ratio data n Name parametric tests we covered Tuesday.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Inferential Statistics. Explore relationships between variables Test hypotheses –Research hypothesis: a statement of the relationship between variables.
IMPORTANCE OF STATISTICS MR.CHITHRAVEL.V ASST.PROFESSOR ACN.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Chi-Square X 2. Review: the “null” hypothesis Inferential statistics are used to test hypotheses Whenever we use inferential statistics the “null hypothesis”
STATS 10x Revision CONTENT COVERED: CHAPTERS
T-tests Chi-square Seminar 7. The previous week… We examined the z-test and one-sample t-test. Psychologists seldom use them, but they are useful to understand.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Nonparametric Statistics
Appendix I A Refresher on some Statistical Terms and Tests.
I. ANOVA revisited & reviewed
Nonparametric Statistics
INF397C Introduction to Research in Information Studies Spring, Day 12
Hypothesis Testing Review
Simulation-Based Approach for Comparing Two Means
Basic Statistics Overview
Sunee Raksakietisak Srinakharinwirot University
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
Nonparametric Statistics
Hypothesis Testing and Comparing Two Proportions
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Parametric versus Nonparametric (Chi-square)
Presentation transcript:

Chi-Square and odds ratios

Semester Recap We’ve covered: Descriptive Statistics Measures of Central Tendency Measures of Variability Z-scores and Graphing Association and Prediction Correlation Regression (simple and multiple) Testing for Group Differences t-tests (one, indep., and paired) ANOVA (One-way, factorial, ANCOVA, RM ANOVA Statistical Concepts Statistical Significance Type I and Type II Error Alpha and p-values Beta and power Effect Sizes

Last weeks… All of the statistical tests on the prior slide are known as ‘parametric’ statistics Parametric Statistics have strict assumptions that must be met before a t-test, correlation, etc… can be used Assumes homoscedasticity of variance (variance between two variables are similar) Assumes a normal distribution (bell-shaped curve) These assumptions are not easily met For example, we’ve used physical activity in several of our examples this semester – physical activity is RARELY normally distributed…

Is this a normal distribution? Nationally Representative Sample of Minutes of Daily Physical Activity Is this a normal distribution? What kind is it? Positive Skew PA

Additionally… Sometimes you need to use a dependent variable that is categorical (grouping) Recall from the chart that the statistical tests we’ve discussed all require 1 continuous dependent variables Example Research Question: Are there more male than female athletic trainers across the United States? I want to know if there are more men than women – this is a nominal dependent variable No correlation, regression, or ANOVA, etc… will help me answer this simple question

Non-Parametric Tests Non-parametric tests can be used when: Parametric tests statistical assumptions are not met Categorical DV’s are used Non-parametric statistics is an entirely different line of statistics that includes dozens of new tests Usually a parametric test has a non-parametric ‘relative’ Chi-Square Test of Independence is similar to Pearson Correlation Most of the time if you can’t meet the assumptions of the parametric test you wanted to do – you can find an appropriate non- parametric test

Non-Parametric Tests (cont) Non-parametric tests have benefits and drawbacks like all statistical tests Benefits: Assumptions are easier to meet Non-parametric tests basically just need nominal or ordinal data Remember that interval or ratio data scales can always be converted to nominal or ordinal – the inverse is not true More ‘robust’ Non-parametric statistics are more versatile tests Recall small changes in our data required completely different parametric tests (e.g., the difference between a t-test and ANOVA) Usually easier to calculate We can realistically hand-calculate several non-parametric tests

Non-Parametric Tests (cont) Non-parametric tests have benefits and drawbacks like all statistical tests Drawbacks: Less ‘mainstream’ Non-parametric tests are commonly used – but far less frequently than parametric tests Less powerful By design, non-parametric tests have less statistical power… What’s that mean? Unless you NEED to use a categorical dependent variable, if your data meets the statistical assumptions you should use parametric stats

Tonight We will discuss and use two non-parametric tests: Chi-square test of independence Logistic Regression (Odds Ratio) These are the most common non-parametric tests Both are related to each other (often used together) Do NOT require normal distribution or homoscedasticity Do require 2 categorical variables (nominal or ordinal)

Chi-Square, χ2 There are actually a few different types of Chi-Square tests, we will discuss the Chi-Square test of independence Test determines if two variables are related (or unrelated) ‘Test of independence’ Similar to Pearson correlation For example, we would expect Sex (Male/Female) is NOT related to hair color (dark/light) Men and women are just as likely to have dark or light hair Sex (Male/Female) is related to having a heart attack (Yes/No) Men have more heart attacks than women do For these tests, start thinking about data in 2x2 tables…

Chi-Square Data ‘Picture’ Cause of Death in Men and Women Heart Attack? Yes No Sex Men 68 32 Women 42 58 A 2x2 table provides a nice summary of the data In this example, ‘Sex’ is the IV and ‘Heart Attack’ is the DV Does male/female increase risk of heart attack? This table provides frequency of occurrence Can also convert to percentage – you will get the same result

SPSS View Data Structure: Key variables are categorical Can look at the data labels or values: Males = 1 Females = 2 HeartAttack Yes = 1 HeartAttack No = 2 Look at Labels and Values:

SPSS View Data Structure: Key variables are categorical Can look at the data labels or values: Males = 1 Females = 2 HeartAttack Yes = 1 HeartAttack No = 2

Chi-Square Data ‘Picture’ Cause of Death in Men and Women Heart Attack? Yes No Sex Men 68 32 Women 42 58 You should fill out the margins of the table (how many men, women, total n, heart attacks, other causes, etc… Do on board

How the Chi-Square works… Cause of Death in Men and Women Heart Attack? Yes No Sex Men 68 32 Women 42 58 The χ2 test has a null hypothesis that there is no difference in the frequency of men/women having heart attacks If the two variables are unrelated (independent), we would expect men and women to have the about same number But, we need a statistical test to know if this difference is RSE

How to run Chi-Square

It is CRITICAL you put the variables in the correct spots Typically the IV goes in the Row And DV goes in the Column Then click on ‘statistics’ It doesn’t really change the answer – but it makes it easier for you to understand the results

What else is there? The ‘cells’ tab will allow you to request percentages in each cell, to go along with the frequencies The ‘format’ tab will allow you to change the organization of your table E.g., put ‘Females’ on the top row, or put ‘No Heart Attack’ in the left column

SPSS Output SPSS provides two initial tables: 1) Case Processing Summary: Ignore, repeat info of… 2) CrossTabs Table (our 2x2): = Frequency. Could have asked for Percentages

Chi-Square Tests We only care about the ‘Pearson Chi-Square’ – yeah, it’s that same guy from correlation… Important info is the χ2 = 13.657, df, p, and n χ2 = 13.66 is just like the t-statistic or r, or F ratio...

Chi-Square df df is calculated by (number of columns – 1) multiplied by (number of rows – 1) 2 rows – 1 = 1 2 columns – 1 = 1 1 x 1 = 1 df All 2x2 tables have 1 df, more variables will change this

Reporting the results: A chi-square test of independence was used compare the frequency of heart attacks between men and women. Sex was significantly related to having a heart attack. Men tended to have more heart attacks than women (χ2(1) = 13.66, p < 0.001). Questions on Chi-Square?

Logistic Regression Researchers do often report just the Chi-Square test results. However, it is also common for them to incorporate logistic regression/or odds ratios Quick definition of Logistic Regression: Type of regression equation that uses a categorical DV Such as heart attack yes/no from our example It allows you to include any type of IV (categorical or continuous) – and any number of IV’s In this sense, it is very similar to simple or multiple linear reg. Instead of providing you with a slope – it provides an odds ratio for each IV

What are the odds…? Odds The probability of an event happening divided by the probability of the event not happening Students often get confused here… A die has 6 sizes, each with a difference number On one die, the odds of rolling a 1 is…? 1 side has a 1 on it, 5 sides do NOT Odds of rolling a 1: 1/5, or 20%

What’s an odds ratio…? Odds Ratio The odds of an event happening in one group divided by the odds of an event happening in another group It is literally the ratio of two odds It acts like effect size for a chi-square The chi-square tells you if there is a difference – the odds ratio tells you how big/strong that difference is If the chi-square is significant – the odds ratio is also statistically significant Literal interpretation is how much more likely an event is to happen in one group versus another It’s easier to see in our example…

Cause of Death in Men and Women Back to our example What are the odds of a heart attack in men? 68/32, or 2.125 What are the odds of a heart attack in women? 42/58, or 0.724 Cause of Death in Men and Women Heart Attack? Yes No Sex Men 68 32 Women 42 58

Cause of Death in Men and Women Back to our example What is the ratio of these odds, or odds ratio? 2.125 / 0.724 = 2.9 = OR Interpretation: Men are 2.9 times more likely to have a heart attack than women (we know it’s significant because of the χ2) Cause of Death in Men and Women Heart Attack? Yes No Sex Men 68 32 Women 42 58

More on odds ratios Interpreting odds ratios can trip up some students: For example, 2.9 is the odds ratio for men vs. women Men are 3 times more likely than women Being a man is a ‘risk factor’ for heart attack What is the odds ratio for women vs. men? 0.724 / 2.125 = 0.34 Women are one-third as likely to have a heart attack than men Being a woman is ‘protective’ of a heart attack Odds Ratios: > 1.0 indicate an increased risk < 1.0 indicate a decreased risk = 1.0 indicate the SAME risk

Another Example: Lung Cancer Cause of Death in Men and Women Lung Cancer? Yes No Sex Men 6 64 Women 16 201 First, notice that way more women had lung cancer But – there are way more women in this sample I’ll run a chi-square in SPSS to see if there is a difference…

Lung Cancer Chi-Square results χ2 = 2.451, df = 1, p = 0.456, n = 287 Is there a difference in the frequency of lung cancer between men and women? A chi-square test revealed that there was no significant difference in the odds of lung cancer between men and women (χ2 (1) = 2.451, p = 0.456). Let’s calculate the odds ratio…

Odds Ratio: Lung Cancer What are the odds of cancer in men? 6/64 = 0.094 In women? 16/201 = 0.079 What is the OR? Odds Ratio = 0.094 / 0.079 = 1.19 It appears men might be slightly more likely than women, but this could be due to RSE Cause of Death in Men and Women Lung Cancer? Yes No Sex Men 6 64 Women 16 201

More on Odds Ratios 1) Take care in setting up your 2x2 table – this can make it really easy to calculate the odds and understand your chi-square or really hard 2) As you can see we are hand-calculating an odds ratio. You can get SPSS to do this for you. If you have only 1 IV (like these example), it’s called “Risk” in the CrossTabs option Like simple linear regression

New Heart Attack Output When you do this, ignore the bottom rows of the box (they are more confusing than helpful) You get the OR and the 95% CI This is the same result we got from hand calculating it, and we knew it was significant because of the chi- square test

95% CI’s and Odds Ratios Is this odds ratio statistically significant? How can you tell?

More on Odds Ratios 2) As you can see we are hand-calculating an odds ratio. You can get SPSS to do this for you. With 1 IV, use crosstabs With multiple IV’s, use ‘logistic regression’

I won’t ask you to do this – but you should know it’s there

More on Odds Ratios 2) As you can see we are hand-calculating an odds ratio. You can get SPSS to do this for you. With 1 IV, use crosstabs With multiple IV’s, use ‘logistic regression’ 3) Odds Ratios can be tricky without the 2x2 table They tell you how much more likely something is to happen for a certain group (like males vs females) But, you should always include the information in the 2x2 table so people know what you’re talking about Example…

Odds of Winning the Powerball Odds of winning the Powerball lotto with 1 ticket: 1 / 175,223,510 = 0.00000000571 Odds of winning the Powerball with 10 tickets: 10/175,223,510 = 0.0000000571 Odds Ratio = 10.0 You are 10 times as likely to win with 10 tickets!!!!! Still better odds to die driving to buy your ticket – or getting hit by lightning – than winning with 10 tickets. Odds Ratios are great statistics – but should not be used to model VERY, VERY rare things

Referent Group 4) Odds ratios are a ratio between two groups The odds ratio was 2.9 or 0.34 for our heart attack example – depending on which group was the referent Referent Group = The ‘baseline’ group in the odds ratio. The group that is in the denominator. 2.125 / 0.724 = 2.9 Women were the referent group – meaning men were 2.9 times more likely than women Must mention this when talking about your odds ratio – you can’t just say, ‘the odds ratio was 2.9’. WHO was 2.9 times more likely than WHO? This is especially important when using them for more than 2 groups…

Example with multiple groups Odds ratios from a current study I’m working on Shows compares the odds of obesity in children meeting recommendations for PA, screen time, sleep Notice that WHO I choose to be the referent group influences all of the odds ratios

QUESTIONS on Chi-Square? Odds Ratios?

Upcoming… In-class activity Homework: Cronk – Read Section 7.2 on Chi-Square of Independence Holcomb Exercises 55, 56, 57 and 59 Exercises 55-57 are on Chi-Square, 59 is on Odds Ratio