Correspondence Analysis Multivariate Chi Square. Goals of CA Produce a picture of multivariate data in one or two dimensions Analyze rows and columns.

Slides:



Advertisements
Similar presentations
Canonical Correlation
Advertisements

Benefits Key Features and Results. XLSTAT-ADA’s functions.
Mutidimensional Data Analysis Growth of big databases requires important data processing.  Need for having methods allowing to extract this information.
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
An Introduction to Multivariate Analysis
Contingency Table and Correspondence Analysis
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
Mapping Nominal Values to Numbers for Effective Visualization Presented by Matthew O. Ward Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew.
Lecture 7: Principal component analysis (PCA)
Multivariate Methods Pattern Recognition and Hypothesis Testing.
Principal Component Analysis
LISA Short Course Series Multivariate Analysis in R Liang (Sally) Shan March 3, 2015 LISA: Multivariate Analysis in RMar. 3, 2015.
Lecture 6 Ordination Ordination contains a number of techniques to classify data according to predefined standards. The simplest ordination technique is.
Correspondence Analysis
Principal component analysis (PCA)
Contingency tables and Correspondence analysis
CHAPTER 19 Correspondence Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.
10/17/071 Read: Ch. 15, GSF Comparing Ecological Communities Part Two: Ordination.
Contingency tables and Correspondence analysis Contingency table Pearson’s chi-squared test for association Correspondence analysis using SVD Plots References.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Prof.Dr.Cevdet Demir
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Chapter 26: Comparing Counts. To analyze categorical data, we construct two-way tables and examine the counts of percents of the explanatory and response.
Principal component analysis (PCA)
From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon
Techniques for studying correlation and covariance structure
Statistics for Marketing & Consumer Research Copyright © Mario Mazzocchi 1 Correspondence Analysis Chapter 14.
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Multivariate Data Analysis Chapter 8 - Canonical Correlation Analysis.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Principal Component vs. Common Factor. Varimax Rotation Principal Component vs. Maximum Likelihood.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Correspondence Analysis Ahmed Rebai Center of Biotechnology of Sfax.
Principal Component Analysis (PCA). Data Reduction summarization of data with many (p) variables by a smaller set of (k) derived (synthetic, composite)
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Dimension Reduction Methods. statistical methods that provide information about point scatters in multivariate space “factor analytic methods” –simplify.
Lecture 12 Factor Analysis.
WORKSHOP ZAGREB JUNE 2004 Correspondence analysis for data mining with applications in medicine Annie Morin IRISA France
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
PATTERN RECOGNITION : PRINCIPAL COMPONENTS ANALYSIS Richard Brereton
Chi Square Tests Chapter 17. Assumptions for Parametrics >Normal distributions >DV is at least scale >Random selection Sometimes other stuff: homogeneity,
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Chi Square Test of Homogeneity. Are the different types of M&M’s distributed the same across the different colors? PlainPeanutPeanut Butter Crispy Brown7447.
Exploring Microarray data
Variables Dependent variable: measures an outcome of a study
Aim: How to plot or graph data
Information Sources & Processing of Anna University: A Study by Dr. R
Information Management course
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Variables Dependent variable: measures an outcome of a study
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Principal Component Analysis (PCA)
Chapter_19 Factor Analysis
15.1 The Role of Statistics in the Research Process
Factor Analysis (Principal Components) Output
Principal Component Analysis
Inference for Two Way Tables
Inference for Two-way Tables
Chapter 14.1 Goodness of Fit Test.
Aim: How to plot or graph data
Correspondence Analysis
Measuring latent variables
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Clustering and Scaling (Week 19)
Presentation transcript:

Correspondence Analysis Multivariate Chi Square

Goals of CA Produce a picture of multivariate data in one or two dimensions Analyze rows and columns simultaneously Plot both on a single scale Often shows chronological ordering

Data Counts or presence/absence for a series of cases or observations (rows) by a number of variables (columns) Composition data: assemblage, pollen, botanical, faunal, trace elements, etc

Dimensions CA works by extracting orthogonal dimensions from the data table (similarly to principal components) Typically one or 2 dimensions are extracted but the maximum number of dimensions is min[(rows-1), (columns-1)]

Plotting CA produces coordinates for each dimension for each row and column in the original data On the plot, the distance between two row points or two column points reflects their similarity or difference Row points help to understand the patterns of column points and vice versa

N. C. Nelson Chronology of the Tano Ruins, New Mexico. American Anthropologist 18(2): > round(prop.table(as.matrix(Nelson[,2:8]),1)*100,2) Corrugated Biscuit Type_I Type_II_Red Type_II_Yellow Type_II_Gray Type_III

> CaModel.1 <- corresp(Nelson[,2:8], nf=2) > CaModel.1 First canonical correlation(s): Row scores: [,1] [,2]

Column scores: [,1] [,2] Corrugated Biscuit Type_I Type_II_Red Type_II_Yellow Type_II_Gray Type_III

> str(CaModel.1) List of 4 $ cor : num [1:2] $ rscore: num [1:10, 1:2] attr(*, "dimnames")=List of 2....$ : chr [1:10] "1" "2" "3" "4" $ : NULL $ cscore: num [1:7, 1:2] attr(*, "dimnames")=List of 2....$ : chr [1:7] "Corrugated" "Biscuit" "Type_I” $ : NULL $ Freq : num [1:10, 1:7] attr(*, "dimnames")=List of 2....$ Row : chr [1:10] "1" "2" "3" "4" $ Column: chr [1:7] "Corrugated" "Biscuit" "Type_I"... - attr(*, "class")= chr "correspondence“ > biplot(CaModel.1, xlim=c(-1,.75)) > plot(CaModel.1$rscore, type="c") > text(CaModel.1$rscore, as.character(1:10))

More Details Package ca provides more statistics regarding the fit –install.packages("ca") –library(ca) –CaModel.2 <- ca(Nelson[,2:8]) –CaModel.2 –summary(CaModel.2) –plot(CaModel.2, xlim=c(-1.3,.8))

CA Terminology 1 Principal Inertias (eigenvalues) – a measure of the inertia (chi square deviation from the mean) explained by each dimension Mass – The weight of each row/col in the analysis (the proportion of cases in that row/column)

CA Terminology 2 ChiDist – how much a profile (row or column) differs from the mean profile Inertia –deviation from average for this row/col Dim. – the scores for each axis

summary() output 1 mass = Mass*1000 qlt = (quality) how well the r/c is represented inr = Inertia*1000 cor = (relative contribution to inertia) contribution to quality for that dimension

summary() output 2 ctr = (absolute contribution to inertia) proportion of r/c inertia for that dimension