Mini-Revision Since week 5 we have learned about hypothesis testing:

Slides:



Advertisements
Similar presentations
Agenda of Week VII Review of Week VI Multiple regression Canonical correlation.
Advertisements

Topic 12: Multiple Linear Regression
Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.
Hypothesis Testing Steps in Hypothesis Testing:
1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce
Linear regression models
Lecture 7: Principal component analysis (PCA)
1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.
Chapter 10 Simple Regression.
Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.
Regression and Correlation
Biol 500: basic statistics
Statistical hypothesis testing – Inferential statistics II. Testing for associations.
Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.
Chapter 13: Inference in Regression
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Correlation.
Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.
Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.
Hypothesis of Association: Correlation
Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.
© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.
Examining Relationships in Quantitative Research
Inferential Statistics
2nd Half Review ANOVA (Ch. 11) Non-Parametric (7.11, 9.5) Regression (Ch. 12) ANCOVA Categorical (Ch. 10) Correlation (Ch. 12)
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.
Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.
Jump to first page Inferring Sample Findings to the Population and Testing for Differences.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.
The “Big Picture” (from Heath 1995). Simple Linear Regression.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Module II Lecture 1: Multiple Regression
Inferential Statistics
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Unsupervised Learning
Chapter 12 Chi-Square Tests and Nonparametric Tests
DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014  
Correlation, Bivariate Regression, and Multiple Regression
Analysis of Variance -ANOVA
Non-Parametric Tests 12/1.
Non-Parametric Tests 12/1.
Correlation – Regression
Non-Parametric Tests 12/6.
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Principal Components Analysis
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.
CHOOSING A STATISTICAL TEST
Non-Parametric Tests.
Kin 304 Inferential Statistics
SDPBRN Postgraduate Training Day Dundee Dental Education Centre
Statistical Tool Boxes
Descriptive Statistics vs. Factor Analysis
Non-parametric tests, part A:
Chapter 13 Group Differences
Non – Parametric Test Dr. Anshul Singh Thapa.
Principal Component Analysis
Principal Component Analysis
Chapter Nine: Using Statistics to Answer Questions
InferentIal StatIstIcs
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
MGS 3100 Business Analysis Regression Feb 18, 2016
Unsupervised Learning
Examine Relationships
Presentation transcript:

Mini-Revision Since week 5 we have learned about hypothesis testing: Is “new” variate indistinguishable from population; using Z-scores and confidence limits Is a sample indistinguishable from a population; using Z-scores and confidence limits Is a “new” variate indistinguishable from a sample; using t-distribution and degrees of freedom (ν) Are two sample distributions indistinguishable; using t-test, ν (paired and independent samples) ANOVA to see if multiple samples indistinguishable Since week 8 we have been adding more (multivariate) tools: LS linear regression to determine amount of variance in one variable explained by another (& residuals!) Pearson’s correlation to determine if two dependent variables co-vary with respect to 1+ independent variable(s) Spearman’s rank-order correlation to determine if two dependent ordinal variables co-vary (NP) K-S test to determine if 2 cumulative frequency distributions are indistinguishable (& test for normality) Wilcoxon tests (NP t-test) Kruskal-Wallis (NP ANOVA)

Kruskal-Wallis (ANOVA) > women <- read.csv("women.csv") > height_rel1 <- women[which(women$Religion == 1),4] > height_rel2 <- women[which(women$Religion == 2),4] > height_rel3 <- women[which(women$Religion == 3),4] > kruskal.test(list(height_rel1, height_rel2, height_rel3)) Kruskal-Wallis rank sum test data: list(height_rel1, height_rel2, height_rel3) Kruskal-Wallis chi-squared = 0.63421, df = 2, p-value = 0.7283 Similar to Wilcoxon: variates in each distribution translated to ranks with average group ranking and total average rank calculated H0:R1 = R2 = R3 = Ra Compare critical value to H using X2 table with α and degrees of freedom n-1 Instead of calculating a stat called F as in ANOVA, the stat is called H and calculated thus. N is total sample size of all groups. Summations are rank sum of a group, squared, divided by n of that group, do this for all groups and sum.

Factor Analysis & PCA Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 PCA & FA are dimensional scaling techniques (distinct from clustering) Goal is to take multivariate data (generally many variables) and compress this into new matrix of fewer variables, generally for EDA Think of correlation analysis: if much variance of dependent variables controlled by unknown independent variable (i.e., high r2), could just focus on single independent variable for analysis PCA & FA do this for many variables recorded on sample units and produce either principal components or factors. PCs/factors like “discovered” independent variables controlling many measured variables. Original variables described by three types of variance: common, specific, and error Extracted factors measure/account for common variance; extracted PCs account for all variances I want students to understand rudiments so they can interpret PCA and Factor Analysis in what you read. Generally EDA, not hypo testing in strict sense. Common, variates in all variables increase/decrease as “discovered” independent variable increases/decreases Specific, variation specific to one variable not found in others Measuement error in variable

Factor Analysis & PCA Extracting Factors & PCs Assumptions Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Extracting Factors & PCs In either procedure, 1st factor/PC extracted accounts for largest amount of variance, 2nd accounts for second largest amount... First factor/PC often interpreted as “size” in analyses of objects Assumptions Variables should be normally distributed, homoscedastic, linearly related, and some expectations of correlated variance Ideally want to account for acceptable amount of variance with fewer factors than there are original variables. Factors need to be orthogonal to avoid correlations of factors. Lots of the dependent variables correlated with some quantity called size. If you can “interpret” what your 1st, 2nd , etc factors might be (e.g., size), you could focus on these kinds of variables to explain variation in data set

Factor Analysis & PCA Extracting Factors & PCs Call: factanal(x = fibdata, factors = 4) ... Loadings: Factor1 Factor2 Factor3 Factor4 FL 0.809 0.359 0.340 0.311 BH 0.765 0.361 -0.448 CD 0.315 0.693 0.394 0.233 ED 0.842 -0.161 FEL 0.934 0.277 0.102 C 0.532 0.424 0.568 0.457 BW -0.259 BT 0.195 0.143 0.342 -0.801 Coils -0.694 0.128 BFA 0.814 0.108 0.102 FA 0.258 0.522 BRA 0.669 -0.212 -0.117 -0.158 Length 0.726 0.549 0.310 0.115 SS loadings 3.654 2.306 1.762 1.559 Proportion Var 0.281 0.177 0.136 0.120 Cumulative Var 0.281 0.458 0.594 0.714 Extracting Factors & PCs fibdata <- read.csv("bronzefibs.csv") fa1_fibdata <- factanal(fibdata, 4) print(fa1_fibdata) Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Factor loadings: like Pearson’s r, indicate the covariance between original variables and the factor. Square these for coefficient of determination (r2) and this is how much of variance in variable explained by factor. SS loadings, sum of square loadings, equivalent to eigenvalue, eigen values above 1 explain more variance than single varibable. Also proportion of variance and cumulative variancance What to do with factors? Do any factors appear to explain certain kinds of variation in an object (it’s size, its plan shape, it’s cutting edge, etc) Identifying these factors as independent variables can help you formulate analyses to measure these independent variables. FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils

Factor Analysis & PCA Also can look at how objects may group together based on their factor scores (like black dots here). Are they all of a certain kind? FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils