Presentation is loading. Please wait.

Presentation is loading. Please wait.

Factor Analysis.

Similar presentations


Presentation on theme: "Factor Analysis."— Presentation transcript:

1 Factor Analysis

2 §1 Introduction Factor analysis (FA) is a method of simplifying data. The essential purpose of FA is to describe, if possible, the covariance relationship among many variables in terms of a few underlying, but unobservable, random quantities called factors. For example, in studies about corporate image or brand image, customers can evaluate malls’ performance by the use of an index system with 24 indicators.

3 Actually, environment of a mall, service of the mall and price of goods are what customers concern mostly. On the basis of 24 original variables, FA can find three potential factors which reflects environment, service and price respectively and then evaluate the mall comprehensively. These factors can be expressed as are called underlying factors which are unobservable. These factors can be used to express all original variables, but each original variable has a part of information that cannot be expressed by the factors which is called specific factor.

4 Notes: FA is different from regression analysis. The meaning of Factors in FA is abstract, but factors in regression analysis have specific meaning; FA is different from PCA. PCA is a method of transforming variables, but FA needs to construct a factor model. PCA: New variables called PCs are the linear combination of original variables. FA: Original variables are expressed by the linear combination of a few underlying variables and random quantities.

5 § 2 Factor Model Mathematical Model
Assuming that there are variables , they can be expressed as the following.

6 are common factors which are unobservable
are common factors which are unobservable. Their coefficients are called factor loadings. is specific factor which is the part that is not contained by the first m common factors.These factors satisfy conditions as follows. are uncorrelated; i.e., are uncorrelated and their variances are 1.

7 I.e., specific factors are uncorrelated, but their variances are not equivalent, 。

8 Expression in matrix

9 Statistical meanings in factor loading matrix
1. Statistical meaning of factor loading aij Factor loading is the correlation efficient between the ith variable and the jth common factor. The model is Mutiplying in both sides of the equation, and then calculate its mathematical expectation. According to common factors' propertities shown in model, (the factor loading matrix's value in line i and column j)reflects the correlation between the ith variable and the jth public factor. The bigger its absolute value is, the closer they are.

10 2. Statistical meaning of communality
Definition: The communality of is the sum of squares of the ith line values in factor loading matrix.That means Statistical Meaning: calculate variances in both sides. The variance contribution of all common factors and specific factors is 1. And the more closes to 1, the stronger explanatory ability of common factors to the variance of will be.

11 3. Statistical meaning of the variance contribution of public factor
The sum of squares of the jth column values in factor loading matrix is It is called the variance contribution of the common factor which is the sum of the variances provided by the common factor to each variable. The variance contribution can be used to figure out the common factor’s influence on variables .

12 § 3 Estimation of Factor Loading Matrix
Principal components analysis method Assuming that the mean of random vector is  and the covariance is . are ’s eigenvalues and are standardized eigenvectors corresponding the eigenvalues. So there is

13 A in the formula given above is exact, but it is useless in practice
A in the formula given above is exact, but it is useless in practice. Our goal is to use a few common factors to explain original variables, so we can delete the last p-m factors.

14 This formula works on the premise that specific factors in model are not important. In this situation, we can ignore the variance of specific factors in the process of simplifying .

15 Example Assuming that the investment rate of fixed assets in a city is , the rate of inflation is , the unemployment rate is . And the correlation matrix is Please try to find the factor model by the use of PCA method.

16 Eigenvalues are :

17 In this example, we can choose the first two factors F1 and F2 to be common factors. F1 can be named as the factor about price and employment and its variance contribution to X is F2 can be named as the factor about investment and its variance contribution to X is F1 and F2 ’s communality are 1 and

18 Properties of factor model
1. X's covariance matrix decomposition. The smaller D’s diagonal values are, the more common factors can explain.

19 2. The model is not affected by measurement units of original data.
Transforming original variable X into X* X*=CX C=diag(c1,c2,…,cn),ci>0。

20

21 3. Factor loading is not unique.
Suppose T is a p×p orthogonal matrix. Let A*=AT,F*=T’F, and the model can be expressed as It satisfies all conditions of conditional factor model.

22 § 4 Factor Rotation The purpose of FA is not only to find common factors and classify variables, but also to know the meanings of each common factor. Due to the factor loading matrix is not unique, we should rotate it so as to simplify the matrix’s construction. There are three main orthogonal rotation methods: quartimax rotation, varimax rotation and equimax rotation.

23 FA about scores in decathlon
Scores of 100 meters Scores of long jump Scores of shot Scores of high jump Scores of 400 meters Scores of 100-meter hurdle Scores of discus throw Scores of pole vault Scores of javelin Scores of 1500 meters

24

25 V Comm. From the factor loading matrix, we can see that the first common factor’s loadings in all variables are high. So it can be called the factor about sports. However, it is hard to explain other factors. Therefore, we should rotate factors. The result is shown as follows.

26 V Comm.

27 Through rotation, factors have more specific meanings.
Scores of 100 meters and scores of 400 meters have bigger loadings in So we can label the factor about the speed of sprint. Scores of shot , scores of discus throw and scores of javelin have bigger loadings in So we can label it the factor about arm-muscle.Scores of 100-meter hurdle , scores of pole vault , scores of long jump and scores of high jump have bigger loadings in So we can label the factor about leg-muscle is the factor about patience in long-distance race.

28 The communality of factors after rotation
Rotation methods The communality of factors after rotation Suppose  is an orthogonal matrix,we can get B by orthogonal transformation. After rotating, the communality of factors remain unchanged !

29 The variance contribution of factors after rotation
Suppose  is an orthogonal matrix,we can get B by orthogonal transformation. After rotation, the variance contribution of each common factor is significantly changed !

30 1、方差最大法 Varimax is so called because it maximizes the sum of the variances of the squared loadings (squared correlations between variables and factors). If a few variables have high loadings on a single factor but near-zero loadings on the remaining factors, the meaning of factor can be explained easier.From the perspective of individuals measured on the variables, varimax seeks a basis that most economically represents each individual—that is, each individual can be well described by a linear combination of only a few basis functions.

31

32

33 § 5 Factor Scores The concept of factor scores
Until now, we have learned how to use the linear combination of common factors to replace the original variables. As the factor variables are less than the original variables, we can use the estimated values of the common factors, called factor scores to replace the original variables for mathematical modeling, or for classification and evaluation on the samples, so as to achieve the purpose of dimensionality reduction and problem simplification.

34 Example FA on the condition of elements in China’s 32 provinces
Example FA on the condition of elements in China’s 32 provinces. There are 7 indicators chosen to evaluate it. X1 :population(ten-thousand persons) X2 :square(square kilometers)X3 :GDP(100-million dollars)X4 :per capita water resources (cubic meters per person)X5:per capita biomass resources(tons per person) X6:the number of college students in every 10,000 persons(persons)X7:the number of scientists and engineers in every 10,000 persons(persons) Rotated Factor Pattern FACTOR1 FACTOR2 FACTOR3 X X X X X X X

35 X1= F F F3 X2= F F F3 X3= F F F3 X4= F F F3 X5= F F F3 X6= F F F3 X7= F F F3 High loading indicators factor's name 因子1 X2;square; X4:per capita water resources; X5:per capita biomass resources. natural resources 因子2 X6:the number of college students in every 10,000 persons; X7:the number of scientists and engineers in every 10,000 persons. human resources 因子3 X1;population; X3:GDP. aggregate indicators of development

36 Standardized Scoring Coefficients FACTOR1 FACTOR2 FACTOR3 X1 0
Standardized Scoring Coefficients FACTOR1 FACTOR2 FACTOR3 X X X X X X X F1= X X X X X X X7 F2= X X X X X X X7 F3= X X X X X X X7

37 Scores of the first 3 factors
REGION FACTOR1 FACTOR2 FACTOR3 beijing© tianjin hebei shanxi1 neimeng liaoning jilin heilongj shanghai

38 FA's mathematical model is
Original variables can be expressed as the linear combination of common factors. But if we want to know each factor’s score, we have to express common factors in the linear combination of original variables . Factor score function Obviously, we should know the values of coefficients in the score function first. Due to p>m, it is hard to calculate the exact values of coefficients. Thus, we have to estimate them.

39 Estimate factor scores by regression
1) idea

40 Then, there are equations as follows.

41 is the correlation matrix of original variables.
j=1,2,…,m is the correlation matrix of original variables.

42 is the coefficient vector of the jth function of factor score.
is the ith line in factor loading matrix. Note:in order to get the coefficients of all score functions, we have to solve m equations.

43 Example 1 National Quality of Life
The final goal of countries’ development is to improve national quality of life so as to satisfy the demand of materials and cultures. In 1990, The United Nations Development Programme( UNDP ) used human development index first to evaluate national quality of life. Human development index consists of three specific indicators. They are health condition (expressed in life expectancy at birth),education (expressed in mean years of schooling and expected years of schooling) and income per capita indicators (expressed in GDP or NI per capita). In a word, human development index can rank countries comprehensively.

44 There are 7 indicators. X1——Life expectancy X2——Literacy rate of adults X3——Comprehensive enrollment rate X4——GDP per capita($) X5——Life expectancy index X6——Education index X7——GDP per capita index

45 Rotated Factor Pattern FACTOR1 FACTOR2 FACTOR3 X1 0. 38129 0. 41765 0
Rotated Factor Pattern FACTOR1 FACTOR2 FACTOR3 X X X X X X X FACTOR1 is the factor about economic development. FACTOR2 is the factor about education. FACTOR3 is the factor about health.

46 Variance explained by each factor FACTOR1 FACTOR2 FACTOR3 2. 439700 2
Variance explained by each factor FACTOR1 FACTOR2 FACTOR Final Communality Estimates: Total = X1 X2 X3 X4 X X6 X

47 Example 2 What factors affect fertility rate?
The fertility rate is influenced by many factors in society, economy, culture and policies. However, these factors’ influences on the fertility rate are not independent and correlated. If we analyze it by means of multi-regression analysis directly, chances are that a few variables are retained and information of other variables is lost. In this situation, FA can help us find variables’ construction and solve this problem with minimum loss of information. Original variables are the rate of families which have many children, the rate of birth control, the percentage of citizens who graduate from junior school at least, the percentage of citizens in urban areas and per capita national income. The following table shows data of 30 provinces in China in 1990.

48

49 Eigenvalues and variance contribution of each factor
Difference Proportion Cumulative 0.6498 0.2429 0.8927 0.0503 0.9431 0.0368 0.9799 0.0201  

50 Factor constuction before rotation
Factor1 Factor2 x1 x2 x3 x4 x5

51 communalities after rotation
variance explained by F1 variance explained by F2

52 Factor construction after
varimax rotation Standardized score function Factor1 Factor2 x1 x2 x3 x4 x5 Factor1 Factor2 x1 x2 x3 x4 x5 In this example, we get two factors finally. Factor 1 is the factor about economic development and Factor 2 is the factor about one-child policy. On the basis of factor scores, we can use them to make further analyses.

53 § 6 Steps and expection of FA
1. 5 steps of FA Choose original variables We should choose variables by means of combination of qualitative analysis and quantitative analysis. The premise of FA is that observations are high-correlated. If this condition cannot be satisfied, it is difficult to find common factors of them. Calculate the correlation matrix of variables. Correlation matrix reflects the relationship of original variables. Knowing whether variables are correlated or not is very important to FA. In a word, correlation matrix is the base of estimating factor construction.

54 Find common factors In this step, we should determine the method of finding common factors and the number of common factors first according to study plan or experience of expects. Mostly, the number of factors can be decided based on their variances. Factors whose variances are greater than 1 may have higher contributions to the total variance. In general, if the cumulative variance contribution of the first m factors is greater than 60%, the number of factors can be determined as m. Factor rotation The idea of rotation is to reduce the number factors on which the variables under investigation have high loadings so as to make the interpretation of the analysis easier.

55 Calculate factor scores
On the basis of factor scores, we can do other analyses. For example, we can use the factor scores as variables of cluster analysis or factors of regression analysis.

56 Actually, FA is very subjective
Actually, FA is very subjective. In some published studies, FA can give reasonable interpretation. However, in most of other studies, FA is not as useful as we think. Now there is not a principle that can judge the quality of FA. In practice, we often use the principle of WOW to judge FA’s quality. When we are performing FA, if we can say that WOW, I understand these factors, it means that we use FA successfully.

57 Notes about FA and PCA  PCA and FA are used to reflect the information of original variables. Thus, the selection of original variables is very important. If original variables are independent, FA and PCA do not work. Because in this situation it’s hard to explain original variables by the combination of a few new variables. So no matter in PCA or FA, correlation of original variables is necessary.

58 Results of FA and PCA may not be as clear as examples mentioned above
Results of FA and PCA may not be as clear as examples mentioned above. Clear results depend on many aspects, such as, the properties of what we study, original variable and the quality of data. We should be cautious about the result sorted by factor scores, especially for some sensitive issues. Because different original variables we choose may produce different factors and different results of rank.


Download ppt "Factor Analysis."

Similar presentations


Ads by Google