Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.

Slides:



Advertisements
Similar presentations
Week 2 – PART III POST-HOC TESTS. POST HOC TESTS When we get a significant F test result in an ANOVA test for a main effect of a factor with more than.
Advertisements

Sociology 690 Multivariate Analysis Log Linear Models.
Topic 12 – Further Topics in ANOVA
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
LEARNING PROGRAMME Hypothesis testing Part 2: Categorical variables Intermediate Training in Quantitative Analysis Bangkok November 2007.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
Analysis of frequency counts with Chi square
Log-linear Analysis - Analysing Categorical Data
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Research methods and statistics
Chi Square Test Dealing with categorical dependant variable.
Statistical Analysis SC504/HS927 Spring Term 2008 Week 17 (25th January 2008): Analysing data.
An Introduction to Logistic Regression
Multiple Regression Research Methods and Statistics.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Split Sample Validation General criteria for split sample validation Sample problems.
8/15/2015Slide 1 The only legitimate mathematical operation that we can use with a variable that we treat as categorical is to count the number of cases.
Chapter 12 Inferential Statistics Gay, Mills, and Airasian
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
Categorical Data Prof. Andy Field.
This Week: Testing relationships between two metric variables: Correlation Testing relationships between two nominal variables: Chi-Squared.
Hypothesis Testing:.
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
LEARNING PROGRAMME Hypothesis testing Intermediate Training in Quantitative Analysis Bangkok November 2007.
Imagine a a bag that contained 90 white marbles and 10 black marbles. If you drew 10 marbles, how many would you expect to come up white, and how many.
LOG-LINEAR MODEL FOR CONTIGENCY TABLES Mohd Tahir Ismail School of Mathematical Sciences Universiti Sains Malaysia.
Chi-square (χ 2 ) Fenster Chi-Square Chi-Square χ 2 Chi-Square χ 2 Tests of Statistical Significance for Nominal Level Data (Note: can also be used for.
Math notebook, pencil and calculator Conditional Relative Frequencies and Association.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 7 Logistic Regression I.
Week 5: Logistic regression analysis Overview Questions from last week What is logistic regression analysis? The mathematical model Interpreting the β.
Education 793 Class Notes Presentation 10 Chi-Square Tests and One-Way ANOVA.
MGT-491 QUANTITATIVE ANALYSIS AND RESEARCH FOR MANAGEMENT OSMAN BIN SAIF Session 26.
Introduction to Inferential Statistics Statistical analyses are initially divided into: Descriptive Statistics or Inferential Statistics. Descriptive Statistics.
© 2014 by Pearson Higher Education, Inc Upper Saddle River, New Jersey All Rights Reserved HLTH 300 Biostatistics for Public Health Practice, Raul.
Chi- square test x 2. Chi Square test Symbolized by Greek x 2 pronounced “Ki square” A Test of STATISTICAL SIGNIFICANCE for TABLE data.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
CHI SQUARE TESTS.
Chapter 13 CHI-SQUARE AND NONPARAMETRIC PROCEDURES.
Copyright © 2010 Pearson Education, Inc. Slide
SW318 Social Work Statistics Slide 1 Logistic Regression and Odds Ratios Example of Odds Ratio Using Relationship between Death Penalty and Race.
Inferential Statistics. The Logic of Inferential Statistics Makes inferences about a population from a sample Makes inferences about a population from.
Statistical Analysis. Z-scores A z-score = how many standard deviations a score is from the mean (-/+) Z-scores thus allow us to transform the mean to.
Leftover Slides from Week Five. Steps in Hypothesis Testing Specify the research hypothesis and corresponding null hypothesis Compute the value of a test.
Logistic Regression Analysis Gerrit Rooks
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Université d’Ottawa / University of Ottawa 2001 Bio 4118 Applied Biostatistics L14.1 Lecture 14: Contingency tables and log-linear models Appropriate questions.
Nonparametric Statistics
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Educational Research Inferential Statistics Chapter th Chapter 12- 8th Gay and Airasian.
Class Seven Turn In: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 For Class Eight: Chapter 20: 18, 20, 24 Chapter 22: 34, 36 Read Chapters 23 &
NURS 306, Nursing Research Lisa Broughton, MSN, RN, CCRN RESEARCH STATISTICS.
Chi Square Chi square is employed to test the difference between an actual sample and another hypothetical or previously established distribution such.
I. ANOVA revisited & reviewed
Cross Tabulation with Chi Square
Nonparametric Statistics
Hypothesis Testing Review
Categorical Data Aims Loglinear models Categorical data
Analysis of Covariance (ANCOVA)
Nonparametric Statistics
Hypothesis Testing Part 2: Categorical variables
CHI SQUARE (χ2) Dangerous Curves Ahead!.
Will use Fruit Flies for our example
Presentation transcript:

Handling Categorical Data

Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to analyse frequency counts

Analysing categorical variables Frequencies – The number of observations within a given category

Assumptions of Chi squared Each observation only contributes to only one cell of the contingency table The expected frequencies should be greater than 5

Chi Squared II Pearsons Chi squared Assess the difference between observed frequencies and expected frequencies in each cell This is achieved by calculating the expected values for each cell Model = RT x CT N

Chi Squared III Likelihood ratio – a comparison of observed frequencies by those predicted by the model (expected) Yates correction – with a 2 x 2 contingency table Pearson’s chi squared can produce a type 1 error (subtract.5 from the deviation and square it) – this makes it less significant

The contingency table I Using my case study on stop and search suppose we wanted to ascertain if black males were stopped more in one month than white males One variable – (black or white male) – What does this tell us

One-way Chi Squared In a simple one way chi squared we would expect that if we had 148 people they would be evenly split between white and black males so expected values would be 78

One-way Chi Squared

SPSS output

The contingency table II It would more useful to look at an additional variable lets say age Two variables Males – Black/white Age – Under 18/over 18

The contingency table II Under 18Over 18Total Black White Total

Example Now using the formula calculate the expected values for the consistency table Model = RT x CT N

SPSS output

Effect size

Odds ratio The odds that a given observation is likely to happen

Loglinear analysis Loglinear works on backward elimination of a model Saturated first, then removes predictors – just like an ANOVA a loglinear assesses the relationship between all variables and describes the outcomes in terms of interactions

Loglinear analysis II With our previous example we had two variables – ethnicity and age If we now added reason for stop and search a loglinear analysis will first assess the 3-way interaction and then assess the varying two- way interactions

Assumptions of loglinear analysis Similar to those of chi squared – observations should fall into one category alone – no more than 20% of cells with frequencies less than 5 – all cells must have frequencies greater than 1 if you don’t meet this assumption you need to decide whether to proceed with the analysis or collapse the data across variables

Output I No of cases should equal the no of total observations No of factors (variables) No of levels (sub-divisions within each variable) Saturated model the maximum interaction possible with observed frequencies Goodness of fit and likelihood ration statistics – the expected frequencies are significantly different from the observed – these should be non significant if model is a good fit

Output II Goodness fit preferred for large samples Likelihood ration is preferred for small samples K-way higher order is asking – if you remove the highest order interaction will the fit of the model be affected – the next k-way affect asking if you remove the highest order following by the next order will the fit of the model be affected – and so on until all affects are removed

Output III K-way effects are zero asks the opposite – that is whether removing main effects will have an effect on the model – the final step is the backward elimination – the analysis will keep going until it has eliminated all effects and advises that the best model has generated class

Now lets try one