Example: Computer Preference – 1 Factor

Slides:

Advertisements

Similar presentations

Overview of Lecture Partitioning Evaluating the Null Hypothesis ANOVA

Advertisements

Factor Analysis and Principal Components Removing Redundancies and Finding Hidden Variables.

Factor Analysis Elizabeth Garrett-Mayer, PhD Georgiana Onicescu, ScM

Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.

Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.

Correlational Designs

Relationships Among Variables

Objectives of Multiple Regression

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.

4.2 One Sided Tests -Before we construct a rule for rejecting H 0, we need to pick an ALTERNATE HYPOTHESIS -an example of a ONE SIDED ALTERNATIVE would.

L 1 Chapter 12 Correlational Designs EDUC 640 Dr. William M. Bauer.

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.

Discussion of time series and panel models

Correlation & Regression Analysis

Chapter 15 The Chi-Square Statistic: Tests for Goodness of Fit and Independence PowerPoint Lecture Slides Essentials of Statistics for the Behavioral.

Principal Component Analysis

Methods of Presenting and Interpreting Information Class 9.

CHAPTER 11 Inference for Distributions of Categorical Data

32931 Technology Research Methods Autumn 2017 Quantitative Research Component Topic 4: Bivariate Analysis (Contingency Analysis and Regression Analysis)

Logic of Hypothesis Testing

Chapter 14 Introduction to Multiple Regression

Dependent-Samples t-Test

INF397C Introduction to Research in Information Studies Spring, Day 12

Week 2 – PART III POST-HOC TESTS.

Applied Biostatistics: Lecture 2

Inference for Regression

Chapter 11: Simple Linear Regression

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 25 Comparing Counts.

Elementary Statistics

Chapter 11 Goodness-of-Fit and Contingency Tables

CHAPTER 29: Multiple Regression*

Prepared by Lee Revere and John Large

AP Stats Check In Where we’ve been… Chapter 7…Chapter 8…

LESSON 20: HYPOTHESIS TESTING

Chapter 11: Inference for Distributions of Categorical Data

Reasoning in Psychology Using Statistics

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Reasoning in Psychology Using Statistics

CHAPTER 11 Inference for Distributions of Categorical Data

UNDERSTANDING RESEARCH RESULTS: STATISTICAL INFERENCE

Psych 231: Research Methods in Psychology

Principal Component Analysis

Chapter 26 Comparing Counts.

Product moment correlation

Factor Analysis BMTRY 726 7/19/2018.

Inferential Statistics

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Psych 231: Research Methods in Psychology

CHAPTER 11 Inference for Distributions of Categorical Data

Psych 231: Research Methods in Psychology

CHAPTER 11 Inference for Distributions of Categorical Data

Psych 231: Research Methods in Psychology

Chapter 26 Comparing Counts Copyright © 2009 Pearson Education, Inc.

Chapter Nine: Using Statistics to Answer Questions

CHAPTER 11 Inference for Distributions of Categorical Data

Introduction to Regression

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Chapter 26 Comparing Counts.

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

CHAPTER 11 Inference for Distributions of Categorical Data

Presentation transcript:

Example: Computer Preference – 1 Factor 7 i.e. #Create dataset Price <- c(6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2) Software <- c(5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3) Aesthetics <- c(3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7) Brand <- c(4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7) Friend <- c(7,2,5,6,2,4,1,7,3,2,6,7,6,2,4,5) Family <- c(6,3,4,7,1,5,4,5,4,4,5,7,2,3,5,6) data <- data.frame(Price, Software, Aesthetics, Brand, Friend, Family) # Find correlation matrix. cor(data) #Factor Analysis is easy to do in R. Here we assume that the number of the hidden variables is 1. library(stats) fa <- factanal(data, factor=1) Consider 13 variables

Example: Computer Preference – 1 factor i.e. #Results Call: factanal(x = data, factors = 1) Uniquenesses: Price Software Aesthetics Brand Friend Family 0.567 0.977 0.126 0.167 0.974 1.000 Loadings: Factor1 Price -0.658 Software -0.152 Aesthetics 0.935 Brand 0.912 Friend 0.161 Family SS loadings 2.190 Proportion Var 0.365 Test of the hypothesis that 1 factor is sufficient. The chi square statistic is 12.79 on 9 degrees of freedom. The p-value is 0.172 Here, the factor analysis is doing a null hypothesis test in which the null hypothesis is that the model described by the factor we have found predicts the data well. So, we have the chi-square goodness-of-fit, which is 12.8, and the p value is 0.17. This means, we cannot reject the null hypothesis, so the factor predicts the data well from the statistics perspective. This is why the result says “Test of the hypothesis that 1 factor is sufficient.” Consider 13 variables

Example: Computer Preference – 2 factors i.e. fa <- factanal(data, factor=2) Call: factanal(x = data, factors = 2) Uniquenesses: Price Software Aesthetics Brand Friend Family 0.559 0.960 0.126 0.080 0.005 0.609 Loadings: Factor1 Factor2 Price -0.657 Software -0.161 0.119 Aesthetics 0.933 Brand 0.928 0.242 Friend 0.100 0.992 Family 0.620 SS loadings 2.207 1.453 Proportion Var 0.368 0.242 Cumulative Var 0.368 0.610 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 2.16 on 4 degrees of freedom. The p-value is 0.706 The p value gets larger, and the Cumulative portion of variance becomes 0.61 (with one variable, it is 0.37). So the model seems to be improved. Loadings shows the weights to calculate the hidden variables from the observed variables. But obviously, the model gets improved if you have more variables, which shows the trade-off between the number of variables and the accuracy of the model. So, how should we decide how many factors we should pick up? Consider 13 variables

How many factors to use i.e. We found the two factors in the example, which are: Factor 1 Factor 2 Price -0.657 Software -0.161 0.119 Aesthetics 0.933 Brand 0.928 0.242 Friend 0.100 0.992 Family 0.620 In the results of FA, some coefficients are missing, but this means these coefficients are just too small, and not necessary equal to zero. You can see the all coefficients by doing like fa$loadings[,1] with more precisions. Although the goodness-of-fit tells you whether the current number of variables are sufficient or not, it does not tell whether the number of variables are large enough for describing the information that the original data have. For instance, why don't we try three factors instead of one or two factors? There are a few ways to answer this question. Consider 13 variables

How many factors to use i.e. Cumulative variance Similar to PCA, you can look at the cumulative portion of variance, and if that reaches some numbers, you can stop adding more factors. Deciding the threshold for the cumulative portion is kind of heuristic. It can be 80% similar to PCA. If your focus is on reducing the number of variables, it can be 50 - 60 %. Kaiser criterion The Kaiser rule is to discard components whose eigenvalues are below 1.0. This is also used in SPSS. You can easily calculate the eigenvalues from the correlation matrix. ev <- eigen(cor(data)) ev$values 2.45701130 1.68900056 0.89157047 0.60583326 0.27285334 0.08373107 So, we can determine that the number of factors should be 2. One problem of Kaiser rule is that it often becomes too strict. Consider 13 variables

How many factors to use i.e. Scree plot Another way to determine the number of factors is to use Scree plot. You plot the components on the X axis, and the eigenvalues on the Y axis and connect them with lines. You then try to find the spot where the slope of the line becomes less steep. So, how exactly should we find the spot like that? Again, it is kind of heuristic. In some cases (particularly when the number of your original variables are small like the example above), you can't find a clear spot like that (try to make a plot by using the following code). Nonetheless, it is good to know how to make a Scree plot. You also need nFactors package. ev <- eigen(cor(data)) library(nFactors) ap <-parallel(subject=nrow(data),var=ncol(data),rep=100,cent=0.05) nS <- nScree(ev$values, ap$eigen$qevpea) plotnScree(nS) . Consider 13 variables

Another example

Motivating Example: Cohesion in Dragon Boat paddler cancer survivors Dragon boat paddling is an ancient Chinese sport that offers a unique blend of factors that could potentially enhance the quality of the lives of cancer survivor participants. Evaluating the efficacy of dragon boating to improve the overall quality of life among cancer survivors has the potential to advance our understanding of factors that influence quality-of-life among cancer survivors. We hypothesize that physical activity conducted within the context of the social support of a dragon boat team contributes significantly to improved overall quality of life above and beyond a standard physical activity program because the collective experience of dragon boating is likely enhanced by team sport factors such as cohesion, teamwork, and the goal of competition. Methods: 134 cancer survivors self-selected to an 8-week dragon boat paddling intervention group or to an organized walking program. Each study arm was comprised of a series of 3 groups of approximately 20-25 participants, with pre- and post-testing to compare quality of life and physical performance outcomes between study arms.

Cohesion Variables: G1 (I do not enjoy being a part of the social environment of this exercise group) G2 (I am not going to miss the members of this exercise group when the program ends) G3 (I am unhappy with my exercise group’s level of desire to exceed) G4 (This exercise program does not give me enough opportunities to improve my personal performance) G5 (For me, this exercise group has become one of the most important social groups to which I belong) G6 (Our exercise group is united in trying to reach its goals for performance) G7 (We all take responsibility for the performance by our exercise group) G8 (I would like to continue interacting with some of the members of this exercise group after the program ends) G9 (If members of our exercise group have problems in practice, everyone wants to help them) G10 (Members of our exercise group do not freely discuss each athlete’s responsibilities during practice) G11 (I feel like I work harder during practice than other members of this exercise group)

Standard Result ------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | -----------------------------------

How to interpret? Loadings: represent correlations between item and factor High loadings: define a factor Low loadings: item does not “load” on factor Easy to skim the loadings This example: factor 1 is defined by G5, G6, G7, G8 G9 factor 2 is defined by G1, G2, G3, G4, G10, G11 Other things to note: factors are ‘independent’ (usually) we need to ‘name’ factors important to check their face validity. These factors can now be ‘calculated’ using this model Each person is assigned a factor score for each factor Range between -1 to 1 ------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | ----------------------------------- High loadings are highlighted In red.

How to interpret? Authors may conclude something like: “We were able to derive two factors from the 11 items. The first factor is defined as “teamwork.” The second factor is defined as “personal competitive nature .” These two factors describe 72% of the variance among the items.” ------------------------------------ Variable | Factor1 Factor2 | -------------+--------------------+ notenjoy | -0.3118 0.5870 | notmiss | -0.3498 0.6155 | desireexceed | -0.1919 0.8381 | personalpe~m | -0.2269 0.7345 | importants~l | 0.5682 -0.1748 | groupunited | 0.8184 -0.1212 | responsibi~y | 0.9233 -0.1968 | interact | 0.6238 -0.2227 | problemshelp | 0.8817 -0.2060 | notdiscuss | -0.0308 0.4165 | workharder | -0.1872 0.5647 | ----------------------------------- High loadings are highlighted In red.

Another example http://rtutorialseries.blogspot.my/2011/10/r-tutorial-series-exploratory-factor.html

Reference http://yatani.jp/teaching/doku.php?id=hcistats:fa