Jun Yan Saikat Maitra 2008 CAS Spring Meeting

Slides:



Advertisements
Similar presentations
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Advertisements

Component Analysis (Review)
Hypothesis Testing Steps in Hypothesis Testing:
Machine Learning Lecture 8 Data Processing and Representation
Analysis of variance (ANOVA)-the General Linear Model (GLM)
Dimensionality Reduction PCA -- SVD
Dimension reduction (1)
1er. Escuela Red ProTIC - Tandil, de Abril, 2006 Principal component analysis (PCA) is a technique that is useful for the compression and classification.
The General Linear Model. The Simple Linear Model Linear Regression.
Multivariate distributions. The Normal distribution.
Lecture 7: Principal component analysis (PCA)
Principal Components Analysis Babak Rasolzadeh Tuesday, 5th December 2006.
Principal Component Analysis CMPUT 466/551 Nilanjan Ray.
Principal Component Analysis
An introduction to Principal Component Analysis (PCA)
Principal Component Analysis
Factor Analysis Purpose of Factor Analysis
Principal component analysis (PCA)
Dimensional reduction, PCA
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
The Terms that You Have to Know! Basis, Linear independent, Orthogonal Column space, Row space, Rank Linear combination Linear transformation Inner product.
Chapter 11 Multiple Regression.
Ordinary least squares regression (OLS)
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Analysis of Individual Variables Descriptive – –Measures of Central Tendency Mean – Average score of distribution (1 st moment) Median – Middle score (50.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Techniques for studying correlation and covariance structure
Separate multivariate observations
The Practice of Social Research
HAWKES LEARNING SYSTEMS math courseware specialists Copyright © 2010 by Hawkes Learning Systems/Quant Systems, Inc. All rights reserved. Chapter 14 Analysis.
Summarized by Soo-Jin Kim
Chapter 2 Dimensionality Reduction. Linear Methods
Principal Components Analysis BMTRY 726 3/27/14. Uses Goal: Explain the variability of a set of variables using a “small” set of linear combinations of.
Chapter 3 Data Exploration and Dimension Reduction 1.
Some matrix stuff.
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
Additive Data Perturbation: data reconstruction attacks.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Multivariate Statistics Matrix Algebra I W. M. van der Veld University of Amsterdam.
N– variate Gaussian. Some important characteristics: 1)The pdf of n jointly Gaussian R.V.’s is completely described by means, variances and covariances.
SINGULAR VALUE DECOMPOSITION (SVD)
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.1 One-Way ANOVA: Comparing.
Lecture 12 Factor Analysis.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 11 Analyzing the Association Between Categorical Variables Section 11.2 Testing Categorical.
Principle Component Analysis and its use in MA clustering Lecture 12.
Principal Component Analysis Zelin Jia Shengbin Lin 10/20/2015.
Feature Extraction 主講人:虞台文. Content Principal Component Analysis (PCA) PCA Calculation — for Fewer-Sample Case Factor Analysis Fisher’s Linear Discriminant.
Description of Multivariate Data. Multivariate Analysis The analysis of many variables.
Université d’Ottawa / University of Ottawa 2001 Bio 8100s Applied Multivariate Biostatistics L11.1 Lecture 11: Canonical correlation analysis (CANCOR)
Feature Extraction 主講人:虞台文.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Principal Components Analysis ( PCA)
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
Dimension reduction (1) Overview PCA Factor Analysis Projection persuit ICA.
1 FACTOR ANALYSIS Kazimieras Pukėnas. 2 Factor analysis is used to uncover the latent (not observed directly) structure (dimensions) of a set of variables.
Lecture 2 Survey Data Analysis Principal Component Analysis Factor Analysis Exemplified by SPSS Taylan Mavruk.
Linear Algebra Review.
PREDICT 422: Practical Machine Learning
Principal Component Analysis
Principal Component Analysis (PCA)
Principal Component Analysis
Descriptive Statistics vs. Factor Analysis
Feature space tansformation methods
Principal Component Analysis
Marios Mattheakis and Pavlos Protopapas
Presentation transcript:

Jun Yan Saikat Maitra 2008 CAS Spring Meeting Principle Component Analysis and Partial Least Squares Two Dimension Reduction Techniques for Regression Jun Yan Saikat Maitra 2008 CAS Spring Meeting

Predictive modeling for pricing: most popular in actuarial Three Major Types of Predictive Modeling in P&C Industry Predictive modeling for pricing: most popular in actuarial Predictive modeling for underwriting: common in commercial lines Predictive modeling for marketing and sales: classic application for predictive modeling Variable Selection and Dimension Reduction Predictive modeling for pricing Predictive modeling for underwriting Predictive modeling for marketing

Why not Stepwise Regression Two major assumptions for stepwise regression: The regression error is normally distributed The proper distribution assumption for a frequency model is Poisson The proper distribution assumption for a severity model is Gamma The proper distribution assumption for a loss ratio model is Tweedie The predictive variables are independent Many demographic variables are correlated Many historical loss experience variables are correlated Many policy charisristics for pricing and underwriting are correlated Many weather variables are correlated …………….

Principle Component Analysis (PCA) and Partial Least Square (PLS) Two major common effects of using PCA or PLS Convert a group of correlated predictive variables to a group of independent variables Construct a “strong” predictive variable from several “weaker” predictive variables Major difference between PCA and PLS PCA is performed without a consideration of the target variable. So PCA is an unsupervised analysis PLS is performed to maximized the correlation between the target variable and the predictive variables. So PLS is a supervised analysis

Principle Component Analysis (PCA) In, Principal Component Analysis, we look for a few linear combinations of the predictive variables which can be used to summarize the data without loosing too much information. Intuitively, Principal components analysis is a method of extracting information from a higher dimensional data by projecting it to a lower dimension. Example: Consider the scatter plot of a 3-dimentional data (3 variables). Data across the 3 variables are higly correlated and majority of the points cluster around the center of the space. This is also the direction of the 1st PC, which roughly gives equal weight to 3 variables Scatter plot of a highly correlated 3D data Plotting the 1st PC on the chart

Standard linear combinations A linear combination of a set of vectors (X1,X2, … , Xp) is an expression of the type ∑αiXi (i=1 to p) and αi’s are scalars weights. A linear combination is said to be normalized or standardized if ∑|αi|=1 (sum of absolute values). In the rest of the article, we will refer Standardized Linear Combination as SLC. In the previous example the 1st Principal component is a linear combination of original variables X1,X2,X3 with α1=-0.56, α2=-0.57 & α3=-0.59 A set of vectors are said to be linearly independent if none of them can be written as a linear combination of any other vectors in the set. In other words, a set of vectors (X1,X2, … , Xp) is linearly independent if the expression ∑αiXi = 0 → αi = 0 for all values of i. A set of vectors not linearly independent is said to be linearly dependent. Correlation is a statistical measure of linear dependence. The data set of n observation on p predictive variables can be represented as a matrix with n-rows and p-columns where each row represents an observation on p different predictive variables. We refer to this matrix as data-matrix represented as Xn×p Both Variance-Covariance or Correlation matrix will be denoted by ∑ in this discussion. Rank of a matrix denotes the maximum number of linearly independent rows or columns of a matrix. A matrix with independent columns is referred to as full column rank. With correlated multivariate data, the data matrix is thus not of full column rank.

Eigen Value Decomposition and Principal Components Eigen Value Decomposition Or Spectral Decomposition of a Matrix Any symmetric matrix Ap×p can be written as A=ГΛГT=∑λiγ’(i)γ(i) where Λp×p is a diagonal matrix with all elements 0 except the diagonal elements and Гp×p is an orthonormal matrix, i.e Г Г’=I (Identity matrix). The diagonal elements of Λ are denoted by λi (i=1 to p) and the columns of Г are denoted by γ(i) (i=1 to p). In matrix algebra, λi’s are called eigen values of A and γ’(i)’s are the corresponding eigen vectors. If A is not a full rank matrix, i.e rank(A) = r < p, then there are only r non-zero eigen values in the above decomposition, rest of the eigen values being equal to 0. Principal Components In principal component analysis we try to arrive at suitable SLC of the data-matrix X based on the EV decomposition of the Variance-Covariance OR Correlation matrix ∑ of X. Let x1×p=(x1,x2,…,xp) denote a random vector observation in the data-matrix, with mean μ1×p and covariance matrix ∑. A principal component is a transformation of the form x1×p → y1×p = (x- μ) 1×p Гp×p, where Г is obtained from the EV decomposition of ∑, i.e ГT∑ Г = Λ = diag(λ1, λ2, …, λp) with λi’s being the eigen values of the decomposition.

Properties of Principal Components PCA analysis is thus based on EV decomposition of the variance-covariance Or correlation matrix of the predictive variables. And, each principal component is thus effectively a linear combination of the predictive variables where weights are obtained from the EV decomposition, also called eigen vectors. Properties Principal Components The following result justifies the use of PCA as a valid variable reduction technique in regression problems. Let x be a random p dimensional vector with mean μ and covariance matrix ∑. Let y be the vector of principal components as defined above. Then the following holds true. (i) E(yi) = 0 (ii) Var(yi) = λi (iii) Cov(yi,yj) = 0 (iv) Var(y1) ≥ Var(y2) ≥ … ≥ Var(yp) (v) No SLC of x has variance larger than λ1, the variance of the 1st principal component. (vi) If z=∑αixi be a SLC of x which is uncorrelated with first k principal components, then variance of z is maximized if z equals the (k+1)th principal component.

Properties of Principal Components The previous results demonstrate that principal components successively capture the maximum of the variance of x, and there is no SLC which can capture maximum variance without being one of the principal components. PCA is thus a method to combine orginal predictive variables using wieghts (derived from EV decomposition), so that maximum variance or correlation of the original data gets captured. The magnitude of λi’s provides the measure of variance captured by the principal components and should be used to select the first few components for a regression. When there is high degree of correlation among the original predictive variables, only the first few of the principal components are likely to capture majority of the variance of the original predictive variables.

Numerical Example of PCA Description of the Data Data simulated for BOP line of business Policy-Year level data Claim Frequenct (claim count per $000 premium) 6 other policy variables created :- fireProt – Fire Protection Class numBldg – Number of Building in Policy numLoc – Number of Locations in Policy bldgAge – Maximum Building Age bldgContents – Building Coverage Indicator polage – Policy Age All the predictive variables are treated as continuous variables including the bldgContents variable. Both the multivariate techniques described in this paper works only with continuous and ordinal variables. Categorical variables cannot be directly analyzed by these methods for variable reduction.

Numerical Example of PCA (contd ..) Correlation Matrix of the predictive variable As can be seen that numBldg and numLoc are highly correlated and the variable fireProt has significant correlation with 2 other variables. Principal Component Analysis Steps Step 1: Computing the eigen value decomposition of the correlation matrix. The 6 eigen values were computed to be as:- We can see the first 4 eigen values capture about 90% of the information in the correlation matrix.

Numerical Example of PCA (contd ..) Principal Component Analysis Steps (contd ..) The eigen vectors (columns of matrix Г in the Jordan decomposition) corresponding to each of the eigen values above are:- Step 2: Construct the principle components corresponding to each eigen value by linearly combining the standardized predictive variables using the corresponding eigen vector. Hence the 1st Principle component can be computed as: PrinComp1 = -0.336139581 * ( fireProt - 4.55789 ) / 2.4533790858 + 0.6649848702 * ( numBldg - 1.10179 ) / 0.6234843087 + 0.5610599572 * ( numLoc - 1.16947 ) / 0.4635645241 + -0.313430401 * ( bldgAge - 48.5329 ) / 17.719473959 + 0.1682134808 * ( bldgContents - 2.36607 ) / 0.8750945166 + 0.0590138772 * ( polage - 4.81878 ) / 3.1602055599

Numerical Example of PCA (contd ..) Principal Component Analysis Steps (contd ..) Step 4: Perform Regression/GLM analysis using the principal components instead of the original predictive variables. In this example, a Poisson regression with IDENTITY link was performed on claim frequency using the 6 principal components as predictive variables. The summary of the regression is displayed below: The P-values and chisquare-statistics demonstrate that the first 3 principle components explained about 75% the predictive power of the original 6 policy variables. Also, the rank of the predictive power didn’t line up with the order of the principle components. .

Partial Least Squares (PLS) Partial Least Squares is another technique to determine linear combinations of the predictive variables. Unlike PCA, the PLS technique works by successively extracting factors from both predictive and target variables such that covariance between the extracted factors is maximized. Description of the technique Assume X is a n×p matrix and Y is a nxq matrix. The PLS technique works by successively extracting factors from both X and Y such that covariance between the extracted factors is maximized. PLS method can work with multivariate response variables (i.e when Y is a n×q vector with q>1). However for our purpose we will assume we have a single response (target) variable i.e Y is n×1 and X is n×p as before. PLS technique tries to find a linear decomposition of X and Y such that X =TPT + E and Y=UQT + F, where T n×r = X-scores U n×r = Y-scores P p×r= X-loadings Q 1×r = Y-loadings E n×p = X-residuals F n×1 = Y-residual Decomposition is finalized so as to maximize covariance between T and U.

Partial Least Squares (PLS) Eigen Value Decomposition Algorithm Each extracted x-score are linear combinations of X. For example the first extracted x-score t of X is of the form t=Xw, where w is the eigen vector corresponding to the first eigen value of XTYYTX. Similarly the first y-score is u=Yc, where c is eigen vector corresponding to the first eigen value of YTXXTY. Note that XTY denotes the covariance of X and Y. Once the first factors have been extracted we deflate the original values of X and Y as, X1=X – ttTX and Y1=Y- ttTY The above process is now repeated to extract the 2nd PLS factors. The process continues till we have extracted all possible latent factors t and u, i.e when X is reduced to a null matrix. The number of latent factors extracted depends on the rank of X.

Numerical Example of PLS The same BOP simulated dataset was used to perform the PLS numerical example. If we use the eigen value decomposition algorithm discussed earlier, the first step is to compute the covariance XTY. The covariance between the six predictive variables and the target variable are: As noted the first PLS factor can be computed from the eigen value decomposition of the matrix XTYYTX. The XTYYTX matrix is :

Numerical Example of PLS (contd ..) The first eigen vector of the eigen value decomposition of the XTYYTX matrix is : { -0.1588680, -0.6501667, -0.6831309, -0.1495317, -0.2056388, 0.1439770}. The first PLS X-scrore is determined by linearly combining the predictive variables using the above values. Xsr1= - 0.1588680 * ( fireProt - 4.55789 ) / 2.4533790858 - 0.6501667 * ( numBldg - 1.10179 ) / 0.6234843087 - 0.6831309 * ( numLoc - 1.16947 ) / 0.4635645241 - 0.14953171 * ( bldgAge - 48.5329 ) / 17.719473959 - 0.2056388 * ( bldgContents - 2.36607) / 0.8750945166 + 0.1439770 * ( polage - 4.81878 ) / 3.1602055599 Once the first factor has been extracted, the original X and Y is deflated by an amount( Xscr1*XscrT) times the original X and Y values. The eigen value decomposition is then performed on the deflated values, untill all factors have been extracted.

Numerical Example of PLS (contd ..) Finally a Poisson Regression with IDENTITY link was performed on claim frequency using the extracted PLS factors. The regression statistics are displayed below. Comparing to the chiSq statistics derived from the GLM using PCA, we can see how each PLS factors are extracted in order of significance and predictive power.

A simulation study Design of the Simulation Study In this section we compare the performance of the 2 methods using a simulation study. We also discuss relative advantages and disadvantages of the 2 methods. Design of the Simulation Study A number of simulated datasets were created by re-sampling from original data. PCA and PLS analysis were performed on these data samples and ChiSq statistics of the extracted PCA factors and PLS factors were compared. The exhibit below shows the results on 3 such samples. We can see from the above table that the chi-squared statistics of the first two PLS factors are always more than the corresponding 2 PCA factors in capturing more information.

Conclusion PCA and PLS serve for two purposes in regression analysis. First, both of the techniques are used to convert a set of highly correlated variables to a set of independent variables by using linear transformations. Second, both of the techniques are used for variable reductions. When a dependent variable for a regression is specified, the PLS technique is more efficient than the PCA technique for dimension reduction due to the supervised nature of its algorithm.