Correspondence Analysis

Slides:



Advertisements
Similar presentations
Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Advertisements

Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
SPSS Session 5: Association between Nominal Variables Using Chi-Square Statistic.
Contingency Tables Chapters Seven, Sixteen, and Eighteen Chapter Seven –Definition of Contingency Tables –Basic Statistics –SPSS program (Crosstabulation)
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Basic Statistics The Chi Square Test of Independence.
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Livelihoods analysis using SPSS. Why do we analyze livelihoods?  Food security analysis aims at informing geographical and socio-economic targeting 
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Chapter 11 Contingency Table Analysis. Nonparametric Systems Another method of examining the relationship between independent (X) and dependant (Y) variables.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 5 Analyzing.
1 An Introduction to IBM SPSS PSY450 Experimental Psychology Dr. Dwight Hennessy.
Lecture 3: Chi-Sqaure, correlation and your dissertation proposal Non-parametric data: the Chi-Square test Statistical correlation and regression: parametric.
Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.
Tables and graphs for frequencies and summary statistics
Contingency tables and Correspondence analysis
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection Getting Ready for Data Collection The Data Collection Process The Data Collection.
7.11 Using Statistics To Make Inferences 7 Summary Single sample test of variance. Comparison of two variances. Monday, 22 June 20159:52 PM.
Factor Analysis Factor analysis is a method of dimension reduction.
Contingency tables and Correspondence analysis Contingency table Pearson’s chi-squared test for association Correspondence analysis using SVD Plots References.
Data Analysis Statistics. OVERVIEW Getting Ready for Data Collection The Data Collection Process Getting Ready for Data Analysis Descriptive Statistics.
Version 4 for Windows NEX T. Welcome to SphinxSurvey Version 4,4, the integrated solution for all your survey needs... Question list Questionnaire Design.
Week 14 Chapter 16 – Partial Correlation and Multiple Regression and Correlation.
Leon-Guerrero and Frankfort-Nachmias,
Problem 1: Relationship between Two Variables-1 (1)
Introduction to Excel 2007 Part 3: Bar Graphs and Histograms Psych 209.
Relationships Among Variables
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Internal Consistency Reliability Analysis PowerPoint.
The Project AH Computing. Functional Requirements  What the product must do!  Examples attractive welcome screen all options available as clickable.
Chapter 8: Bivariate Regression and Correlation
Chapter 12: Analysis of Variance
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.
Principal Components Principal components is a method of dimension reduction. Suppose that you have a dozen variables that are correlated. You might use.
Association between Variables Measured at the Nominal Level.
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Correspondence Analysis Chapter 14.
How to Analyze Data? Aravinda Guntupalli. SPSS windows process Data window Variable view window Output window Chart editor window.
PY550 Research and Statistics Dr. Mary Alberici Central Methodist University.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
Chapter 2 Dimensionality Reduction. Linear Methods
Tutor: Prof. A. Taleb-Bendiab Contact: Telephone: +44 (0) CMPDLLM002 Research Methods Lecture 9: Quantitative.
Some matrix stuff.
Analyzing and Interpreting Quantitative Data
Chapter 20 For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 These tests can be used when all of the data from a study has been measured on.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
Getting Started with MATLAB 1. Fundamentals of MATLAB 2. Different Windows of MATLAB 1.
Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Pearson Chi-Square Contingency Table Analysis.
Hypothesis testing Intermediate Food Security Analysis Training Rome, July 2010.
1 Copyright © Cengage Learning. All rights reserved. 3 Descriptive Analysis and Presentation of Bivariate Data.
What is SPSS  SPSS is a program software used for statistical analysis.  Statistical Package for Social Sciences.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
CHI SQUARE TESTS.
SPSS Workshop Day 2 – Data Analysis. Outline Descriptive Statistics Types of data Graphical Summaries –For Categorical Variables –For Quantitative Variables.
Smoking Data The investigation was based on examining the effectiveness of smoking cessation programs among heavy smokers who are also recovering alcoholics.
Chapter 6: Analyzing and Interpreting Quantitative Data
Correlation/Regression - part 2 Consider Example 2.12 in section 2.3. Look at the scatterplot… Example 2.13 shows that the prediction line is given by.
Mr. Magdi Morsi Statistician Department of Research and Studies, MOH
DTC Quantitative Methods Summary of some SPSS commands Weeks 1 & 2, January 2012.
Principle Component Analysis and its use in MA clustering Lecture 12.
Chapter 2: Frequency Distributions. Frequency Distributions After collecting data, the first task for a researcher is to organize and simplify the data.
Introduction to statistics I Sophia King Rm. P24 HWB
Chi-Square Analyses.
Outline of Today’s Discussion 1.The Chi-Square Test of Independence 2.The Chi-Square Test of Goodness of Fit.
Principal Component Analysis
Basic Statistics The Chi Square Test of Independence.
Theme 5. Association 1. Introduction. 2. Bivariate tables and graphs.
Analyzing and Interpreting Quantitative Data
Principal Component Analysis
Comparing Two Variables
 .
Correspondence Analysis
Presentation transcript:

Correspondence Analysis Correspondence analysis is a descriptive/exploratory technique designed to analyse simple two-way and multi-way tables containing some measure of correspondence between the rows and columns. The results provide information which is similar in nature to those produced by Factor Analysis techniques, and they allow one to explore the structure of categorical variables included in the table. The most common kind of table of this type is the two-way frequency cross-tabulation table. Mike Cox, Newcastle University, me fecit 24/11/2014 Sunday, 16 April 2017 9:12 PM

Correspondence Analysis Correspondence analysis (CA) may be defined as a special case of principal components analysis (PCA) of the rows and columns of a table, especially applicable to a cross-tabulation. However CA and PCA are used under different circumstances. Principal components analysis is used for tables consisting of continuous measurement, whereas correspondence analysis is applied to contingency tables (i.e. cross-tabulations). Its primary goal is to transform a table of numerical information into a graphical display, in which each row and each column is depicted as a point.

Correspondence Analysis In a typical correspondence analysis, a cross-tabulation table of frequencies is first standardised, so that the relative frequencies across all cells sum to 1.0. One way to state the goal of a typical analysis is to represent the entries in the table of relative frequencies in terms of the distances between individual rows and/or columns in a low-dimensional space. There are several parallels in interpretation between correspondence analysis and factor analysis.

Correspondence Analysis Correspondence Analysis Applied to Psychological Research L. Doey and J. Kurta Tutorials in Quantitative Methods for Psychology 2011, Vol. 7(1), p. 5-14.

Correspondence Analysis An Introduction to Correspondence Analysis P.M. Yelland The Mathematica Journal 2010, Vol. 12, p. 1-23. 5

Correspondence Analysis Correspondence analysis is a useful tool to uncover the relationships among categorical variables N. Sourial, C. Wolfson, B. Zhu, J. Quail, J. Fletcher, S. Karunananthan, K. Bandeen-Roche, F. Béland and H. Bergman Journal of Clinical Epidemiology 2010 Volume 63, Issue 6, Pages 638-646 6

Correspondence Analysis The data summarises individuals political affiliation (1,…,5) and geographic region (1,…,4) . 1 Liberal 2 Tend Lib 3 Moderate 4 Tend Cons 5 Conservative

Correspondence Analysis The data summarises individuals political affiliation (1,…,5) and geographic region (1,…,4) . 1 Northeast 2 Midwest 3 South 4 West

Correspondence Analysis The data (a) summarises individuals political affiliation (1,…,5) and geographic region (1,…,4) . 725 rows of data

Correspondence Analysis Analyze > Dimension Reduction > Correspondence Analysis

Correspondence Analysis Select row/column variables. And define the ranges. Having defined the ranges. Use the buttons at the side of the screen to set desired parameters.

Correspondence Analysis Define Row Range. Select row bound, Update and then Continue There are 4 regions.

Correspondence Analysis Define Column Range. Select column bound, Update and then Continue There are 5 political affiliations.

Correspondence Analysis Finally Use the buttons at the side of the screen to set desired parameters.

Correspondence Analysis Select Statistics

Correspondence Analysis Select Plots

Correspondence Analysis Finally use the OK button to run the analysis, or Paste to preserve the syntax Syntax CORRESPONDENCE TABLE = region4(1 4) BY politics(1 5) /DIMENSIONS = 2 /MEASURE = CHISQ /STANDARDIZE = RCMEAN /NORMALIZATION = SYMMETRICAL /PRINT = TABLE RPOINTS CPOINTS RPROFILES CPROFILES RCONF CCONF /PLOT = NDIM(1,MAX) BIPLOT(20) RPOINTS(20) CPOINTS(20) TRROWS(20) TRCOLUMNS (20) .

Correspondence Analysis The Correspondence Table is simply the cross-tabulation of the row and column variables, including the row and column marginal totals, serving as input.

Correspondence Analysis The Row Profiles are the cell contents divided by their corresponding row total (eg. 19/131=0.145 for the first cell). This table also shows the column masses (column marginals as a percent of n) (eg. 93/725=0.128). These are intermediate calculations on the way toward computing distances between points. Note the column of 1’s.

Correspondence Analysis Column Profiles are the cell elements divided by the column marginals (ex. 19/103=0.204). This table also shows the row masses (row marginals as a percent of n) (ex. 131/725=0.181). These are intermediate calculations on the way toward computing distances between points. Note the row of 1’s.

Correspondence Analysis In the Summary table, we first look at the chi‑square value and see that it is significant, justifying the assumption that the two variables are apparently related.

Correspondence Analysis SPSS has computed the interpoint distances and subjected the distance matrix to principal components analysis, yielding in this case three dimensions.

Correspondence Analysis Only the interpretable dimensions are reported, not the full solution, which is why the eigen values add to something less than 100% (labelled Inertia; these are the percent of variance explained by each dimension) - in this case only 0.057 = 5.7%. This reflects the fact that the correlation between region and political outlook, while significant, is weak.

Correspondence Analysis The eigen values (called inertia here) reflect the relative importance of each dimension, with the first always being the most important, the next second most important, etc.

Correspondence Analysis The singular values are simply the square roots of the eigen values. They are interpreted as the maximum canonical correlation between the categories of the variables in analysis for any given dimension.

Correspondence Analysis Note that the "Proportion of Inertia" columns are the dimension eigen values divided by the total (table) eigen value. That is, they are the percent of variance each dimension explains of the variance explained: thus the first dimension explains 62.7% of the 5.7% of the variance explained by the model.

Correspondence Analysis The standard deviation columns refer back to the singular values and helps the researcher assess the relative precision of each dimension.

Correspondence Analysis Keyword interpretations Mass: the marginal proportions of the row variable, used to weight the point profiles when computing point distance. This weighting has the effect of compensating for unequal numbers of cases. Scores in dimension: scores used as coordinates for points when plotting the correspondence map. Each point has a score on each dimension. Inertia: Variance

Correspondence Analysis Contribution of points to dimensions: as factor loadings are used in conventional factor analysis to ascribe meaning to dimensions, so "contribution of points to dimensions" is used to intuit the meaning of correspondence dimensions. Contribution of dimensions to points: these are multiple correlations, which reflect how well the principal components model is explaining any given point (category).

Correspondence Analysis The Overview Row Points table, for each row point in the correspondence table, displays the mass, scores in dimension, inertia, contribution of the point to the inertia of the dimension, and contribution of the dimension to the inertia of the point. Overview Row Points

Correspondence Analysis The Overview Column Points table is similar to the previous one, except for the column variable (party rather than region) in the correspondence table. Overview Column Points

Correspondence Analysis The Confidence Row Points tables display the standard deviations of the row scores (the values used as coordinates to plot the correspondence map) and are used to assess their precision.

Correspondence Analysis The Confidence Column Points tables display the standard deviations of the column scores (the values used as coordinates to plot the correspondence map) and are used to assess their precision.

Correspondence Analysis The plots of transformed categories for dimensions display a plot of the transformation of the row category values and of column category values into scores in dimension, with one plot per dimension. The x-axis has the category values and the y-axis has the corresponding dimension scores. Thus the category "Northeast" in the Overview Row Points table above had a score in dimension of -0.702, as shown on the plot.

Correspondence Analysis Refer back to “Overview Row Points” dimension 1 Why join!

Correspondence Analysis Refer back to “Overview Row Points” dimension 2

Correspondence Analysis Refer back to “Overview Column Points” dimension 1

Correspondence Analysis Refer back to “Overview Column Points” dimension 2

Correspondence Analysis The uniplots for the row and column variables. Note that the origin of the axes is slightly different in the two plots.

Correspondence Analysis Refer back to “Overview Row Points” dimensions 1 & 2

Correspondence Analysis Refer back to “Overview Column Points” dimensions 1 & 2

Correspondence Analysis Finally the biplot correspondence map is obtained. Note the axes now encompass the most extreme values of both of the uniplots. Note that while some generalizations can be made about the association of categories (South more conservative, West more liberal). The researcher must keep firmly in mind that correspondence is not association. That is, the researcher should not allow the maps display of inter-category distances to obscure the fact that, for this example, the model only explains 5.7% of the variance in the correspondence table.

Correspondence Analysis Refer back to “Overview Row Points” dimensions 1 & 2 and “Overview Column Points” dimensions 1 & 2.

Correspondence Analysis Care must be taken when interpreting the previous plot. It must be remembered that distances between columns and rows are not defined. “Symmetrical normalization (via the model button slide) is a technique used to standardize row and column data so as to be able to make general comparisons between the two. Other forms of standardization allow you to compare row variable points or column variable points, or rows or columns, but not rows to columns (see Garson, 2012 for further information on other standardization techniques for correspondence analysis).” Doey and Kurta 2011 (slide)

Correspondence Analysis Input Of A Collated Data Matrix An SPSS program that will do this operation is ANACOR, although since we are using data in table form, this has to be performed using the command syntax window.

Correspondence Analysis The data editor looks like It contains the collated data matrix. Note that we have only the matrix of interest in this view.

Correspondence Analysis You must employ the syntax Either via File > Open > Syntax

Correspondence Analysis With the prepared commands in an ascii file ANACOR TABLE= ALL (5 , 4) /DIMENSION = 2 /NORMALIZATION = canonical /VARIANCES= COLUMNS /PLOT =NDIM (1 , 2) Note the command "ALL" since we are providing the table Note "5" for the number of rows Note "4" for the number of columns

Correspondence Analysis Or via File > New > Syntax

Correspondence Analysis With the commands input into the Syntax Editor

Correspondence Analysis The solution is, of course, unchanged.

SPSS Tips Now you should go and try for yourself. Each week our cluster (5.05) is booked for 2 hours after this session. This will enable you to come and go as you please. Obviously other timetabled sessions for this module take precedence..