Mini-Revision Since week 5 we have learned about hypothesis testing:

Slides:

Advertisements

Similar presentations

Agenda of Week VII Review of Week VI Multiple regression Canonical correlation.

Advertisements

Topic 12: Multiple Linear Regression

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.

Hypothesis Testing Steps in Hypothesis Testing:

1 SSS II Lecture 1: Correlation and Regression Graduate School 2008/2009 Social Science Statistics II Gwilym Pryce

Linear regression models

Lecture 7: Principal component analysis (PCA)

1-1 Regression Models  Population Deterministic Regression Model Y i =  0 +  1 X i u Y i only depends on the value of X i and no other factor can affect.

Chapter 10 Simple Regression.

Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.

Regression and Correlation

Biol 500: basic statistics

Statistical hypothesis testing – Inferential statistics II. Testing for associations.

Week 12 Chapter 13 – Association between variables measured at the ordinal level & Chapter 14: Association Between Variables Measured at the Interval-Ratio.

Chapter 13: Inference in Regression

Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.

Statistical Significance R.Raveendran. Heart rate (bpm) Mean ± SEM n In men ± In women ± The difference between means.

Chapter 11 HYPOTHESIS TESTING USING THE ONE-WAY ANALYSIS OF VARIANCE.

Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.

Statistics 11 Correlations Definitions: A correlation is measure of association between two quantitative variables with respect to a single individual.

Hypothesis of Association: Correlation

Correlation and Regression Used when we are interested in the relationship between two variables. NOT the differences between means or medians of different.

© 2011 Cengage Learning. All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part, except for use as permitted in a license.

Examining Relationships in Quantitative Research

Inferential Statistics

2nd Half Review ANOVA (Ch. 11) Non-Parametric (7.11, 9.5) Regression (Ch. 12) ANCOVA Categorical (Ch. 10) Correlation (Ch. 12)

MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.

University of Warwick, Department of Sociology, 2012/13 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression.

Principal Components Analysis. Principal Components Analysis (PCA) A multivariate technique with the central aim of reducing the dimensionality of a multivariate.

Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.

DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 14 th February 2013.

Jump to first page Inferring Sample Findings to the Population and Testing for Differences.

Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)

Copyright © 2008 by Nelson, a division of Thomson Canada Limited Chapter 18 Part 5 Analysis and Interpretation of Data DIFFERENCES BETWEEN GROUPS AND RELATIONSHIPS.

The “Big Picture” (from Heath 1995). Simple Linear Regression.

Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.

Module II Lecture 1: Multiple Regression

Inferential Statistics

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Week 5 Multiple Regression

Unsupervised Learning

Chapter 12 Chi-Square Tests and Nonparametric Tests

DTC Quantitative Methods Bivariate Analysis: t-tests and Analysis of Variance (ANOVA) Thursday 20th February 2014

Correlation, Bivariate Regression, and Multiple Regression

Analysis of Variance -ANOVA

Non-Parametric Tests 12/1.

Non-Parametric Tests 12/1.

Correlation – Regression

Non-Parametric Tests 12/6.

Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.

Principal Components Analysis

University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard) Analysing Means II: Nonparametric techniques.

CHOOSING A STATISTICAL TEST

Non-Parametric Tests.

Kin 304 Inferential Statistics

SDPBRN Postgraduate Training Day Dundee Dental Education Centre

Statistical Tool Boxes

Descriptive Statistics vs. Factor Analysis

Non-parametric tests, part A:

Chapter 13 Group Differences

Non – Parametric Test Dr. Anshul Singh Thapa.

Principal Component Analysis

Principal Component Analysis

Chapter Nine: Using Statistics to Answer Questions

InferentIal StatIstIcs

Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges

MGS 3100 Business Analysis Regression Feb 18, 2016

Unsupervised Learning

Examine Relationships

Presentation transcript:

Mini-Revision Since week 5 we have learned about hypothesis testing: Is “new” variate indistinguishable from population; using Z-scores and confidence limits Is a sample indistinguishable from a population; using Z-scores and confidence limits Is a “new” variate indistinguishable from a sample; using t-distribution and degrees of freedom (ν) Are two sample distributions indistinguishable; using t-test, ν (paired and independent samples) ANOVA to see if multiple samples indistinguishable Since week 8 we have been adding more (multivariate) tools: LS linear regression to determine amount of variance in one variable explained by another (& residuals!) Pearson’s correlation to determine if two dependent variables co-vary with respect to 1+ independent variable(s) Spearman’s rank-order correlation to determine if two dependent ordinal variables co-vary (NP) K-S test to determine if 2 cumulative frequency distributions are indistinguishable (& test for normality) Wilcoxon tests (NP t-test) Kruskal-Wallis (NP ANOVA)

Kruskal-Wallis (ANOVA) > women <- read.csv("women.csv") > height_rel1 <- women[which(women$Religion == 1),4] > height_rel2 <- women[which(women$Religion == 2),4] > height_rel3 <- women[which(women$Religion == 3),4] > kruskal.test(list(height_rel1, height_rel2, height_rel3)) Kruskal-Wallis rank sum test data: list(height_rel1, height_rel2, height_rel3) Kruskal-Wallis chi-squared = 0.63421, df = 2, p-value = 0.7283 Similar to Wilcoxon: variates in each distribution translated to ranks with average group ranking and total average rank calculated H0:R1 = R2 = R3 = Ra Compare critical value to H using X2 table with α and degrees of freedom n-1 Instead of calculating a stat called F as in ANOVA, the stat is called H and calculated thus. N is total sample size of all groups. Summations are rank sum of a group, squared, divided by n of that group, do this for all groups and sum.

Factor Analysis & PCA Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 PCA & FA are dimensional scaling techniques (distinct from clustering) Goal is to take multivariate data (generally many variables) and compress this into new matrix of fewer variables, generally for EDA Think of correlation analysis: if much variance of dependent variables controlled by unknown independent variable (i.e., high r2), could just focus on single independent variable for analysis PCA & FA do this for many variables recorded on sample units and produce either principal components or factors. PCs/factors like “discovered” independent variables controlling many measured variables. Original variables described by three types of variance: common, specific, and error Extracted factors measure/account for common variance; extracted PCs account for all variances I want students to understand rudiments so they can interpret PCA and Factor Analysis in what you read. Generally EDA, not hypo testing in strict sense. Common, variates in all variables increase/decrease as “discovered” independent variable increases/decreases Specific, variation specific to one variable not found in others Measuement error in variable

Factor Analysis & PCA Extracting Factors & PCs Assumptions Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Extracting Factors & PCs In either procedure, 1st factor/PC extracted accounts for largest amount of variance, 2nd accounts for second largest amount... First factor/PC often interpreted as “size” in analyses of objects Assumptions Variables should be normally distributed, homoscedastic, linearly related, and some expectations of correlated variance Ideally want to account for acceptable amount of variance with fewer factors than there are original variables. Factors need to be orthogonal to avoid correlations of factors. Lots of the dependent variables correlated with some quantity called size. If you can “interpret” what your 1st, 2nd , etc factors might be (e.g., size), you could focus on these kinds of variables to explain variation in data set

Factor Analysis & PCA Extracting Factors & PCs Call: factanal(x = fibdata, factors = 4) ... Loadings: Factor1 Factor2 Factor3 Factor4 FL 0.809 0.359 0.340 0.311 BH 0.765 0.361 -0.448 CD 0.315 0.693 0.394 0.233 ED 0.842 -0.161 FEL 0.934 0.277 0.102 C 0.532 0.424 0.568 0.457 BW -0.259 BT 0.195 0.143 0.342 -0.801 Coils -0.694 0.128 BFA 0.814 0.108 0.102 FA 0.258 0.522 BRA 0.669 -0.212 -0.117 -0.158 Length 0.726 0.549 0.310 0.115 SS loadings 3.654 2.306 1.762 1.559 Proportion Var 0.281 0.177 0.136 0.120 Cumulative Var 0.281 0.458 0.594 0.714 Extracting Factors & PCs fibdata <- read.csv("bronzefibs.csv") fa1_fibdata <- factanal(fibdata, 4) print(fa1_fibdata) Artefact FL BH CD ED FEL C BW BT BFA FA BRA Length 1 93 24 16 13 31 47 3.5 7 10 114 2 21 6 11 1.7 9 5 35 3 33 15 8 20 3.9 3.2 60 4 23 26 12 6.2 7.7 74 5.2 68 27 3.7 55 6.1 4.1 45 18 40 17.6 1.4 54 19 17 9.2 6.6 39 Factor loadings: like Pearson’s r, indicate the covariance between original variables and the factor. Square these for coefficient of determination (r2) and this is how much of variance in variable explained by factor. SS loadings, sum of square loadings, equivalent to eigenvalue, eigen values above 1 explain more variance than single varibable. Also proportion of variance and cumulative variancance What to do with factors? Do any factors appear to explain certain kinds of variation in an object (it’s size, its plan shape, it’s cutting edge, etc) Identifying these factors as independent variables can help you formulate analyses to measure these independent variables. FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils

Factor Analysis & PCA Also can look at how objects may group together based on their factor scores (like black dots here). Are they all of a certain kind? FL = foot length, BH = bow height, BFA = bow foot angle, FA = foot angle, CD = coil diameter, BRA = bow rear angle, ED = element diameter, FEL = foot extension length, C = catchplate, BW = bow width, BT = bow thickness, FEW = foot extension width, Coils = Number of coils