height_rel1 <- women[which(women$Religion == 1),4] > height_rel2 <- women[which(women$Religion == 2),4] > height_rel3 <- women[which(women$Religion == 3),4] > kruskal.test(list(height_rel1, height_rel2, height_rel3)) Kruskal-Wallis rank sum test data: list(height_rel1, height_rel2, height_rel3) Kruskal-Wallis chi-squared = , df = 2, p-value = Similar to Wilcoxon: variates in each distribution translated to ranks with average group ranking and total average rank calculated H0:R1 = R2 = R3 = Ra Compare critical value to H using X2 table with α and degrees of freedom n-1 Instead of calculating a stat called F as in ANOVA, the stat is called H and calculated thus. N is total sample size of all groups. Summations are rank sum of a group, squared, divided by n of that group, do this for all groups and sum."> height_rel1 <- women[which(women$Religion == 1),4] > height_rel2 <- women[which(women$Religion == 2),4] > height_rel3 <- women[which(women$Religion == 3),4] > kruskal.test(list(height_rel1, height_rel2, height_rel3)) Kruskal-Wallis rank sum test data: list(height_rel1, height_rel2, height_rel3) Kruskal-Wallis chi-squared = , df = 2, p-value = Similar to Wilcoxon: variates in each distribution translated to ranks with average group ranking and total average rank calculated H0:R1 = R2 = R3 = Ra Compare critical value to H using X2 table with α and degrees of freedom n-1 Instead of calculating a stat called F as in ANOVA, the stat is called H and calculated thus. N is total sample size of all groups. Summations are rank sum of a group, squared, divided by n of that group, do this for all groups and sum.">
Download presentation
Presentation is loading. Please wait.
Published byGloria White Modified over 6 years ago
1
Mini-Revision Since week 5 we have learned about hypothesis testing:
Is “new” variate indistinguishable from population; using Z-scores and confidence limits Is a sample indistinguishable from a population; using Z-scores and confidence limits Is a “new” variate indistinguishable from a sample; using t-distribution and degrees of freedom (ν) Are two sample distributions indistinguishable; using t-test, ν (paired and independent samples) ANOVA to see if multiple samples indistinguishable Since week 8 we have been adding more (multivariate) tools: LS linear regression to determine amount of variance in one variable explained by another (& residuals!) Pearson’s correlation to determine if two dependent variables co-vary with respect to 1+ independent variable(s) Spearman’s rank-order correlation to determine if two dependent ordinal variables co-vary (NP) K-S test to determine if 2 cumulative frequency distributions are indistinguishable (& test for normality) Wilcoxon tests (NP t-test) Kruskal-Wallis (NP ANOVA)
2
Kruskal-Wallis (ANOVA)
> women <- read.csv("women.csv") > height_rel1 <- women[which(women$Religion == 1),4] > height_rel2 <- women[which(women$Religion == 2),4] > height_rel3 <- women[which(women$Religion == 3),4] > kruskal.test(list(height_rel1, height_rel2, height_rel3)) Kruskal-Wallis rank sum test data: list(height_rel1, height_rel2, height_rel3) Kruskal-Wallis chi-squared = , df = 2, p-value = Similar to Wilcoxon: variates in each distribution translated to ranks with average group ranking and total average rank calculated H0:R1 = R2 = R3 = Ra Compare critical value to H using X2 table with α and degrees of freedom n-1 Instead of calculating a stat called F as in ANOVA, the stat is called H and calculated thus. N is total sample size of all groups. Summations are rank sum of a group, squared, divided by n of that group, do this for all groups and sum.
3
Factor Analysis & PCA Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 PCA & FA are dimensional scaling techniques (distinct from clustering) Goal is to take multivariate data (generally many variables) and compress this into new matrix of fewer variables, generally for EDA Think of correlation analysis: if much variance of dependent variables controlled by unknown independent variable (i.e., high r2), could just focus on single independent variable for analysis PCA & FA do this for many variables recorded on sample units and produce either principal components or factors. PCs/factors like “discovered” independent variables controlling many measured variables. Original variables described by three types of variance: common, specific, and error Extracted factors measure/account for common variance; extracted PCs account for all variances I want students to understand rudiments so they can interpret PCA and Factor Analysis in what you read. Generally EDA, not hypo testing in strict sense. Common, variates in all variables increase/decrease as “discovered” independent variable increases/decreases Specific, variation specific to one variable not found in others Measuement error in variable
4
Factor Analysis & PCA Extracting Factors & PCs Assumptions
Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Extracting Factors & PCs In either procedure, 1st factor/PC extracted accounts for largest amount of variance, 2nd accounts for second largest amount... First factor/PC often interpreted as “size” in analyses of objects Assumptions Variables should be normally distributed, homoscedastic, linearly related, and some expectations of correlated variance Ideally want to account for acceptable amount of variance with fewer factors than there are original variables. Factors need to be orthogonal to avoid correlations of factors. Lots of the dependent variables correlated with some quantity called size. If you can “interpret” what your 1st, 2nd , etc factors might be (e.g., size), you could focus on these kinds of variables to explain variation in data set
5
Factor Analysis & PCA Extracting Factors & PCs
Call: factanal(x = fibdata, factors = 4) ... Loadings: Factor1 Factor2 Factor3 Factor4 FL BH CD ED FEL C BW BT Coils BFA FA BRA Length SS loadings Proportion Var Cumulative Var Extracting Factors & PCs fibdata <- read.csv("bronzefibs.csv") fa1_fibdata <- factanal(fibdata, 4) print(fa1_fibdata) Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Factor loadings: like Pearson’s r, indicate the covariance between original variables and the factor. Square these for coefficient of determination (r2) and this is how much of variance in variable explained by factor. SS loadings, sum of square loadings, equivalent to eigenvalue, eigen values above 1 explain more variance than single varibable. Also proportion of variance and cumulative variancance What to do with factors? Do any factors appear to explain certain kinds of variation in an object (it’s size, its plan shape, it’s cutting edge, etc) Identifying these factors as independent variables can help you formulate analyses to measure these independent variables. FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils
6
Factor Analysis & PCA Also can look at how objects may group together based on their factor scores (like black dots here). Are they all of a certain kind? FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.