Correspondence Analysis Multivariate Chi Square
Goals of CA Produce a picture of multivariate data in one or two dimensions Analyze rows and columns simultaneously Plot both on a single scale Often shows chronological ordering
Data Counts or presence/absence for a series of cases or observations (rows) by a number of variables (columns) Composition data: assemblage, pollen, botanical, faunal, trace elements, etc
Dimensions CA works by extracting orthogonal dimensions from the data table (similarly to principal components) Typically one or 2 dimensions are extracted but the maximum number of dimensions is min[(rows-1), (columns-1)]
Plotting CA produces coordinates for each dimension for each row and column in the original data On the plot, the distance between two row points or two column points reflects their similarity or difference Row points help to understand the patterns of column points and vice versa
N. C. Nelson Chronology of the Tano Ruins, New Mexico. American Anthropologist 18(2): > round(prop.table(as.matrix(Nelson[,2:8]),1)*100,2) Corrugated Biscuit Type_I Type_II_Red Type_II_Yellow Type_II_Gray Type_III
> CaModel.1 <- corresp(Nelson[,2:8], nf=2) > CaModel.1 First canonical correlation(s): Row scores: [,1] [,2]
Column scores: [,1] [,2] Corrugated Biscuit Type_I Type_II_Red Type_II_Yellow Type_II_Gray Type_III
> str(CaModel.1) List of 4 $ cor : num [1:2] $ rscore: num [1:10, 1:2] attr(*, "dimnames")=List of 2....$ : chr [1:10] "1" "2" "3" "4" $ : NULL $ cscore: num [1:7, 1:2] attr(*, "dimnames")=List of 2....$ : chr [1:7] "Corrugated" "Biscuit" "Type_I” $ : NULL $ Freq : num [1:10, 1:7] attr(*, "dimnames")=List of 2....$ Row : chr [1:10] "1" "2" "3" "4" $ Column: chr [1:7] "Corrugated" "Biscuit" "Type_I"... - attr(*, "class")= chr "correspondence“ > biplot(CaModel.1, xlim=c(-1,.75)) > plot(CaModel.1$rscore, type="c") > text(CaModel.1$rscore, as.character(1:10))
More Details Package ca provides more statistics regarding the fit –install.packages("ca") –library(ca) –CaModel.2 <- ca(Nelson[,2:8]) –CaModel.2 –summary(CaModel.2) –plot(CaModel.2, xlim=c(-1.3,.8))
CA Terminology 1 Principal Inertias (eigenvalues) – a measure of the inertia (chi square deviation from the mean) explained by each dimension Mass – The weight of each row/col in the analysis (the proportion of cases in that row/column)
CA Terminology 2 ChiDist – how much a profile (row or column) differs from the mean profile Inertia –deviation from average for this row/col Dim. – the scores for each axis
summary() output 1 mass = Mass*1000 qlt = (quality) how well the r/c is represented inr = Inertia*1000 cor = (relative contribution to inertia) contribution to quality for that dimension
summary() output 2 ctr = (absolute contribution to inertia) proportion of r/c inertia for that dimension