Categorical Data Analysis: Logistic Regression and Log-Linear Regression 26 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University For discussion: Myers.

Slides:



Advertisements
Similar presentations
Sociology 680 Multivariate Analysis Logistic Regression.
Advertisements

Analysis of Categorical Data Nick Jackson University of Southern California Department of Psychology 10/11/
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Anita M. Baker, Ed.D. Jamie Bassell Evaluation Services Program Evaluation Essentials Evaluation Support 2.0 Session 2 Bruner Foundation Rochester, New.
Chi-square A very brief intro. Distinctions The distribution The distribution –Chi-square is a probability distribution  A special case of the gamma.
Log-linear Analysis - Analysing Categorical Data
PSYC512: Research Methods PSYC512: Research Methods Lecture 19 Brian P. Dyre University of Idaho.
Chi Square Test Dealing with categorical dependant variable.
Chi-square Test of Independence
Log-linear and logistic models
Handling Categorical Data. Learning Outcomes At the end of this session and with additional reading you will be able to: – Understand when and how to.
Summary of Quantitative Analysis Neuman and Robson Ch. 11
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
C. Logit model, logistic regression, and log-linear model A comparison.
SW388R7 Data Analysis & Computers II Slide 1 Multiple Regression – Basic Relationships Purpose of multiple regression Different types of multiple regression.
Statistics Idiots Guide! Dr. Hamda Qotba, B.Med.Sc, M.D, ABCM.
1 of 27 PSYC 4310/6310 Advanced Experimental Methods and Statistics © 2013, Michael Kalsher Michael J. Kalsher Department of Cognitive Science Adv. Experimental.
CPSY 501: Lecture 11, Nov14 Non-Parametric Tests Between subjects:
AS 737 Categorical Data Analysis For Multivariate
Categorical Data Prof. Andy Field.
Inferential Statistics: SPSS
Wednesday PM  Presentation of AM results  Multiple linear regression Simultaneous Simultaneous Stepwise Stepwise Hierarchical Hierarchical  Logistic.
Categorical Data Analysis School of Nursing “Categorical Data Analysis 2x2 Chi-Square Tests and Beyond (Multiple Categorical Variable Models)” Melinda.
Meta-Analysis, Generalized Linear Models, and Categorical Data Analysis 19 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University Download: Hill & Lent.
Logit model, logistic regression, and log-linear model A comparison.
POTH 612A Quantitative Analysis Dr. Nancy Mayo. © Nancy E. Mayo A Framework for Asking Questions Population Exposure (Level 1) Comparison Level 2 OutcomeTimePECOT.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Statistical analysis Prepared and gathered by Alireza Yousefy(Ph.D)
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Advanced statistics for master students Loglinear models.
Complex Analytic Designs. Outcomes (DVs) Predictors (IVs)1 ContinuousMany Continuous1 CategoricalMany Categorical None(histogram)Factor Analysis: PCA,
Assessing Binary Outcomes: Logistic Regression Peter T. Donnan Professor of Epidemiology and Biostatistics Statistics for Health Research.
Analysis of Qualitative Data Dr Azmi Mohd Tamil Dept of Community Health Universiti Kebangsaan Malaysia FK6163.
Nonparametric Tests: Chi Square   Lesson 16. Parametric vs. Nonparametric Tests n Parametric hypothesis test about population parameter (  or  2.
Mixed-Design ANOVA 5 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University Please download: treatment5.sav.
CHI SQUARE TESTS.
HYPOTHESIS TESTING BETWEEN TWO OR MORE CATEGORICAL VARIABLES The Chi-Square Distribution and Test for Independence.
The “Big 4” – Significance, Effect Size, Sample Size, and Power 18 Sep 2009 Dr. Sean Ho CPSY501 cpsy501.seanho.com.
Fundamental Statistics in Applied Linguistics Research Spring 2010 Weekend MA Program on Applied English Dr. Da-Fu Huang.
CPSY 501: Lecture 12, Nov 21 Review non-parametric tests Categorical analysis: χ 2, Log-Linear Meta-analysis: e.g., Hill & Lent article Review: Cycles.
Statistical Analysis. Z-scores A z-score = how many standard deviations a score is from the mean (-/+) Z-scores thus allow us to transform the mean to.
N318b Winter 2002 Nursing Statistics Specific statistical tests Chi-square (  2 ) Lecture 7.
Factorial ANOVA Repeated-Measures ANOVA 6 Nov 2009 CPSY501 Dr. Sean Ho Trinity Western University Please download: Treatment5.sav MusicData.sav For next.
Multiple Regression  Similar to simple regression, but with more than one independent variable R 2 has same interpretation R 2 has same interpretation.
Log-linear Models HRP /03/04 Log-Linear Models for Multi-way Contingency Tables 1. GLM for Poisson-distributed data with log-link (see Agresti.
12/23/2015Slide 1 The chi-square test of independence is one of the most frequently used hypothesis tests in the social sciences because it can be used.
Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.
Ch15: Multiple Regression 3 Nov 2011 BUSI275 Dr. Sean Ho HW7 due Tues Please download: 17-Hawlins.xls 17-Hawlins.xls.
1 Week 3 Association and correlation handout & additional course notes available at Trevor Thompson.
Power Analysis: Sample Size, Effect Size, Significance, and Power 16 Sep 2011 Dr. Sean Ho CPSY501 cpsy501.seanho.com Please download: SpiderRM.sav.
Categorical Data Analysis and Meta-Analysis 20 Nov 2009 CPSY501 Dr. Sean Ho Trinity Western University Download: GenderDepr.sav Fitzpatrick et al. Hill.
Nonparametric Statistics
Week 7: General linear models Overview Questions from last week What are general linear models? Discussion of the 3 articles.
Mixed-Design ANOVA 13 Nov 2009 CPSY501 Dr. Sean Ho Trinity Western University Please download: treatment5.sav.
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
STATISTICAL TESTS USING SPSS Dimitrios Tselios/ Example tests “Discovering statistics using SPSS”, Andy Field.
Chapter 4 Selected Nonparemetric Techniques: PARAMETRIC VS. NONPARAMETRIC.
Categorical Data Analysis and Meta-Analysis
Nonparametric Statistics
Research in Social Work Practice Salem State University
Categorical Data Aims Loglinear models Categorical data
Basic Statistics Overview
Qualitative data – tests of association
Summarising and presenting data - Bivariate analysis
Nonparametric Statistics
The Chi-Square Distribution and Test for Independence
females males Analyses with discrete variables
BIVARIATE ANALYSIS: Measures of Association Between Two Variables
Lexico-grammar: From simple counts to complex models
Presentation transcript:

Categorical Data Analysis: Logistic Regression and Log-Linear Regression 26 Nov 2010 CPSY501 Dr. Sean Ho Trinity Western University For discussion: Myers & Hayes Horowitz For the lecture: GenderDepr.sav Fitzpatrick et al.

26 Nov 2010CPSY501: logistic, log-linear2 Outline for today Linear models: Logistic regression Log-linear regression Categorical Data Analysis 2 vars: chi-squared test, effect sizes Multiple vars: log-linear analysis Example: Fitzpatrick '01

26 Nov 2010CPSY501: logistic, log-linear3 Generalized Linear Model To deal with a categorical DV, we need the Generalized Linear Model: f( Y ) ~ X 1 + X 2 + … The linear model predicts not Y directly, but the link function f() applied to Y Examples of link functions: f(Y) = log(Y): log-linear regression Used when Y represents counts/frequencies f(Y) = logit(Y): logistic regression Used when Y represents a probability (0..1)

26 Nov 2010CPSY501: logistic, log-linear4 GLM: log-linear regress. When DV is counts/frequencies, its distribution is often not normal, but Poisson e.g., DV = # violent altercations If mean is large, Poisson → normal e.g., “log( violent_alts ) ~ depression” residuals (ε) are also Poisson distributed Log-linear is also used to look at many cat. vars IVs are all categorical (factorial cells) DV = # people in each cell Fitzpatrick, et al. example paper later roymech.co.uk

26 Nov 2010CPSY501: logistic, log-linear5 GLM: logistic regression When DV is a probability (0 to 1), the distribution is binomial e.g., DV = “likelihood to develop depress.” Probability of Y: P(Y). Odds of Y: Logit link function: logit(Y) = log( odds(Y) ) Also works for DV = # out of total e.g., DV = “# correct out of 100” As #tot → ∞, binomial → Poisson Also works for binary (dichot.) DV e.g., DV = “is pregnant” zoonek2.free.fr Princeton WWS 509

26 Nov 2010CPSY501: logistic, log-linear6 Outline for today Linear models: Logistic regression Log-linear regression Categorical Data Analysis 2 vars: chi-squared test, effect sizes Multiple vars: log-linear analysis Example: Fitzpatrick '01

26 Nov 2010CPSY501: logistic, log-linear7 Contingency tables When comparing two categorical variables, all observations can be partitioned into cells of the contingency table e.g., two dichotomous variables: 2x2 table Gender vs. clinically depressed: DepressedNot Depressed Female Male98122 RQ: is there a significant relationship between gender and depression?

26 Nov 2010CPSY501: logistic, log-linear8 SPSS: frequency data Usually, each row in the Data View represents one participant In this case, we'd have 500 rows For our example, each row will represent one cell of the contingency table, and we will specify the frequency for each cell Open: GenderDepr.sav Data → Weight Cases: Weight Cases by Select “Frequency” as Frequency Variable

26 Nov 2010CPSY501: logistic, log-linear9 2 categorical vars: χ 2 and φ Chi-squared (χ 2 ) test: Two categorical variables Asks: is there a significant relationship? Requirements on expected cell counts: No cells have expected count ≤ 1, and <20% of cells have expected count < 5 Else (for few counts) use Fisher's exact test Effect size: φ is akin to correlation: definition: φ 2 = χ 2 / n Cramer's V extends φ for more than 2 levels Odds ratio: #yes / #no

26 Nov 2010CPSY501: logistic, log-linear10 SPSS: χ 2 and φ Analyze → Descriptives → Crosstabs: One var goes in Row(s), one in Column(s) Cells: Counts: Observed, Expected, and Residuals: Standardized, may also want Percentages: Row, Column, and Total Statistics: Chi-square, Phi and Cramer's V Exact: Fisher's exact test: best for small counts, computationally intensive If χ 2 is significant, use standardized residuals (z-scores) to follow-up which categories differ

26 Nov 2010CPSY501: logistic, log-linear11 Reporting χ 2 results As in ANOVA, IVs with several categories require follow-up analysis to determine which categories show the effect The equivalent of a single pairwise comparison is a 2x2 contingency table! Report: “There was a significant association between gender and depression, χ 2 (1) = ___, p <.001. Females were twice as likely to have depression as males.” Odds ratio: (#F w/depr) / (#M w/depr)

26 Nov 2010CPSY501: logistic, log-linear12 Outline for today Linear models: Logistic regression Log-linear regression Categorical Data Analysis 2 vars: chi-squared test, effect sizes Multiple vars: log-linear analysis Example: Fitzpatrick '01

26 Nov 2010CPSY501: logistic, log-linear13 Many categorical variables Need not have IV/DV distinction Use log-linear: Generalized Linear Model Include all the categorical vars as IVs DV = # people in each cell e.g., “count ~ employment * gender * depr” Look for moderation / interactions: e.g., employment * gender * depression Then lower-level interactions and main effects e.g., employment * depression

26 Nov 2010CPSY501: logistic, log-linear14 Goodness of Fit Two χ 2 metrics measure how well our model (expected counts) fits the data (observed): Pearson χ 2 and likelihood ratio (G) (likelihood ratio is preferred for small n) Significance test looks for deviation of observed counts from expected (model) So if our model fits the data well, then the Pearson and likelihood ratio should be small, and the test should be non-significant SPSS tries removing various effects to find the simplest model that still fits the data well

26 Nov 2010CPSY501: logistic, log-linear15 Hierarchical Backward Select'n By default, SPSS log-linear regression uses automatic hierarchical “backward” selection: Starts with all main effects and all interactions For a “saturated” categorical model, all cells in contingency table are modelled, so the “full- factorial” model fits the data perfectly: likelihood ratio is 0 and p-value = 1.0. Then removes effects one at a time, starting with higher-order interactions first: Does it have a significant effect on fit? How much does fit worsen? ( Δ G)

26 Nov 2010CPSY501: logistic, log-linear16 Example: Fitzpatrick et al.  Fitzpatrick, M., Stalikas, A., Iwakabe, S. (2001). Examining Counselor Interventions and Client Progress in the Context of the Therapeutic Alliance. Psychotherapy, 38(2), Exploratory design with 3 categorical variables, coded from session recordings / transcripts: Counsellor interventions (VRM) Client good moments (GM) Strength of working alliance (WAI) Therapy: 21 sessions, male & female clients & therapists, expert therapists, diverse models.

26 Nov 2010CPSY501: logistic, log-linear17 Fitzpatrick: Research Question RQ: For expert therapists, what associations exist amongst VRM, GM, and WAI? Therapist Verbal Response Modes: 8 categories: encouragement, reflection, self- disclosure, guidance, etc. Client Good Moments: Significant (I)nformation, (E)xploratory, or (A)ffective-Expressive Working Alliance Inventory Observer rates: low, moderate, high

26 Nov 2010CPSY501: logistic, log-linear18 Fitzpatrick: Abstract Client “good moments” did not necessarily increase with Alliance Different interventions fit with good moments of client information (GM-I) at different Alliance levels. “Qualitatively different therapeutic processes are in operation at different Alliance levels.” Explain each statement and how it summarizes the results.

26 Nov 2010CPSY501: logistic, log-linear19 Top-down Analysis: Interaction As in ANOVA and Regression, Loglinear analysis starts with the most complex interaction (“highest order”) and tests if it adds incrementally to the overall model fit Compare with ΔR 2 in regression analysis Interpretation focuses on: 3-way interaction: VRM * GM * WAI Then the 2-way interactions: GM * WAI, etc. Fitzpatrick did separate analyses for each of the three kinds of good moments: GM-I, GM-E, GM-A

26 Nov 2010CPSY501: logistic, log-linear20 Results: Interactions 2-way CGM-E x WAI interaction: Exploratory Good Moments tended to occur more frequently in High Alliance sessions 2-way WAI x VRM interaction: Structured interventions (guidance) take place in Hi or Lo Alliance sessions, while Unstructured interventions (reflection) are higher in Moderate Alliance sessions Describes shared features of “working through” and “working with” clients, different functions of safety & guidance.

26 Nov 2010CPSY501: logistic, log-linear21

26 Nov 2010CPSY501: logistic, log-linear22 Formatting Tables in MS-Word Use the “insert table” and “table properties” functions of Word to build your tables; don’t do it manually. General guidelines for table formatting can be found on pages of the APA manual. Additional tips and examples: see NCFR site: In particular, pay attention to the column alignment article, for how to get your numbers to align according to the decimal point.