Example: Computer Preference – 1 Factor

Example: Computer Preference – 1 Factor
7 i.e. #Create dataset Price <- c(6,7,6,5,7,6,5,6,3,1,2,5,2,3,1,2) Software <- c(5,3,4,7,7,4,7,5,5,3,6,7,4,5,6,3) Aesthetics <- c(3,2,4,1,5,2,2,4,6,7,6,7,5,6,5,7) Brand <- c(4,2,5,3,5,3,1,4,7,5,7,6,6,5,5,7) Friend <- c(7,2,5,6,2,4,1,7,3,2,6,7,6,2,4,5) Family <- c(6,3,4,7,1,5,4,5,4,4,5,7,2,3,5,6) data <- data.frame(Price, Software, Aesthetics, Brand, Friend, Family) # Find correlation matrix. cor(data) #Factor Analysis is easy to do in R. Here we assume that the number of the hidden variables is 1. library(stats) fa <- factanal(data, factor=1) Consider 13 variables

Example: Computer Preference – 1 factor
i.e. #Results Call: factanal(x = data, factors = 1) Uniquenesses: Price Software Aesthetics Brand Friend Family Loadings: Factor1 Price Software Aesthetics 0.935 Brand 0.912 Friend 0.161 Family SS loadings 2.190 Proportion Var 0.365 Test of the hypothesis that 1 factor is sufficient. The chi square statistic is on 9 degrees of freedom. The p-value is 0.172 Here, the factor analysis is doing a null hypothesis test in which the null hypothesis is that the model described by the factor we have found predicts the data well. So, we have the chi-square goodness-of-fit, which is 12.8, and the p value is This means, we cannot reject the null hypothesis, so the factor predicts the data well from the statistics perspective. This is why the result says “Test of the hypothesis that 1 factor is sufficient.” Consider 13 variables

Example: Computer Preference – 2 factors
i.e. fa <- factanal(data, factor=2) Call: factanal(x = data, factors = 2) Uniquenesses: Price Software Aesthetics Brand Friend Family Loadings: Factor1 Factor2 Price Software Aesthetics Brand Friend Family SS loadings Proportion Var Cumulative Var Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 2.16 on 4 degrees of freedom. The p-value is 0.706 The p value gets larger, and the Cumulative portion of variance becomes 0.61 (with one variable, it is 0.37). So the model seems to be improved. Loadings shows the weights to calculate the hidden variables from the observed variables. But obviously, the model gets improved if you have more variables, which shows the trade-off between the number of variables and the accuracy of the model. So, how should we decide how many factors we should pick up? Consider 13 variables

How many factors to use i.e.
We found the two factors in the example, which are: Factor 1 Factor 2 Price Software Aesthetics Brand Friend Family In the results of FA, some coefficients are missing, but this means these coefficients are just too small, and not necessary equal to zero. You can see the all coefficients by doing like fa$loadings[,1] with more precisions. Although the goodness-of-fit tells you whether the current number of variables are sufficient or not, it does not tell whether the number of variables are large enough for describing the information that the original data have. For instance, why don't we try three factors instead of one or two factors? There are a few ways to answer this question. Consider 13 variables

How many factors to use i.e. Cumulative variance
Similar to PCA, you can look at the cumulative portion of variance, and if that reaches some numbers, you can stop adding more factors. Deciding the threshold for the cumulative portion is kind of heuristic. It can be 80% similar to PCA. If your focus is on reducing the number of variables, it can be %. Kaiser criterion The Kaiser rule is to discard components whose eigenvalues are below 1.0. This is also used in SPSS. You can easily calculate the eigenvalues from the correlation matrix. ev <- eigen(cor(data)) ev$values So, we can determine that the number of factors should be 2. One problem of Kaiser rule is that it often becomes too strict. Consider 13 variables

How many factors to use i.e. Scree plot
Another way to determine the number of factors is to use Scree plot. You plot the components on the X axis, and the eigenvalues on the Y axis and connect them with lines. You then try to find the spot where the slope of the line becomes less steep. So, how exactly should we find the spot like that? Again, it is kind of heuristic. In some cases (particularly when the number of your original variables are small like the example above), you can't find a clear spot like that (try to make a plot by using the following code). Nonetheless, it is good to know how to make a Scree plot. You also need nFactors package. ev <- eigen(cor(data)) library(nFactors) ap <-parallel(subject=nrow(data),var=ncol(data),rep=100,cent=0.05) nS <- nScree(ev$values, ap$eigen$qevpea) plotnScree(nS) . Consider 13 variables

Another example

Motivating Example: Cohesion in Dragon Boat paddler cancer survivors
Dragon boat paddling is an ancient Chinese sport that offers a unique blend of factors that could potentially enhance the quality of the lives of cancer survivor participants. Evaluating the efficacy of dragon boating to improve the overall quality of life among cancer survivors has the potential to advance our understanding of factors that influence quality-of-life among cancer survivors. We hypothesize that physical activity conducted within the context of the social support of a dragon boat team contributes significantly to improved overall quality of life above and beyond a standard physical activity program because the collective experience of dragon boating is likely enhanced by team sport factors such as cohesion, teamwork, and the goal of competition. Methods: 134 cancer survivors self-selected to an 8-week dragon boat paddling intervention group or to an organized walking program. Each study arm was comprised of a series of 3 groups of approximately participants, with pre- and post-testing to compare quality of life and physical performance outcomes between study arms.

Cohesion Variables: G1 (I do not enjoy being a part of the social environment of this exercise group) G2 (I am not going to miss the members of this exercise group when the program ends) G3 (I am unhappy with my exercise group’s level of desire to exceed) G4 (This exercise program does not give me enough opportunities to improve my personal performance) G5 (For me, this exercise group has become one of the most important social groups to which I belong) G6 (Our exercise group is united in trying to reach its goals for performance) G7 (We all take responsibility for the performance by our exercise group) G8 (I would like to continue interacting with some of the members of this exercise group after the program ends) G9 (If members of our exercise group have problems in practice, everyone wants to help them) G10 (Members of our exercise group do not freely discuss each athlete’s responsibilities during practice) G11 (I feel like I work harder during practice than other members of this exercise group)

How to interpret? Loadings: represent correlations between item and factor High loadings: define a factor Low loadings: item does not “load” on factor Easy to skim the loadings This example: factor 1 is defined by G5, G6, G7, G8 G9 factor 2 is defined by G1, G2, G3, G4, G10, G11 Other things to note: factors are ‘independent’ (usually) we need to ‘name’ factors important to check their face validity. These factors can now be ‘calculated’ using this model Each person is assigned a factor score for each factor Range between -1 to 1 Variable | Factor1 Factor2 | notenjoy | | notmiss | | desireexceed | | personalpe~m | | importants~l | | groupunited | | responsibi~y | | interact | | problemshelp | | notdiscuss | | workharder | | High loadings are highlighted In red.

Another example

Reference

Example: Computer Preference – 1 Factor

Similar presentations

Presentation on theme: "Example: Computer Preference – 1 Factor"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Example: Computer Preference – 1 Factor

Similar presentations

Presentation on theme: "Example: Computer Preference – 1 Factor"— Presentation transcript:

Similar presentations

About project

Feedback