Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Slides:



Advertisements
Similar presentations
Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics
Advertisements

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides
Dimension reduction (1)
Chapter 17 Overview of Multivariate Analysis Methods
Lecture 7: Principal component analysis (PCA)
1 Multivariate Statistics ESM 206, 5/17/05. 2 WHAT IS MULTIVARIATE STATISTICS? A collection of techniques to help us understand patterns in and make predictions.
Psychology 202b Advanced Psychological Statistics, II April 7, 2011.
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 14 Using Multivariate Design and Analysis.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Distinguishing the Forest from the Trees University of Texas November 11, 2009 Richard Derrig, PhD, Opal Consulting Louise Francis,
L16: Micro-array analysis Dimension reduction Unsupervised clustering.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Goals of Factor Analysis (1) (1)to reduce the number of variables and (2) to detect structure in the relationships between variables, that is to classify.
Chapter 5 Data mining : A Closer Look.
Discriminant Analysis Testing latent variables as predictors of groups.
Data mining methodology in Weka
The Tutorial of Principal Component Analysis, Hierarchical Clustering, and Multidimensional Scaling Wenshan Wang.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Free and Cheap Sources of External Data CAS 2007 Predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.
Principal Component Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Advanced Correlational Analyses D/RS 1013 Factor Analysis.
Data Mining: Neural Network Applications by Louise Francis CAS Annual Meeting, Nov 11, 2002 Francis Analytics and Actuarial Data Mining, Inc.
Thursday AM  Presentation of yesterday’s results  Factor analysis  A conceptual introduction to: Structural equation models Structural equation models.
Descriptive Statistics vs. Factor Analysis Descriptive statistics will inform on the prevalence of a phenomenon, among a given population, captured by.
Predictive Modeling CAS Reinsurance Seminar May 7, 2007 Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining,
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition Instructor’s Presentation Slides 1.
Predictive Modeling Spring 2005 CAMAR meeting Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc
Lecture 12 Factor Analysis.
CHAPTER 5 CORRELATION & LINEAR REGRESSION. GOAL : Understand and interpret the terms dependent variable and independent variable. Draw a scatter diagram.
Review of Fraud Classification Using Principal Components Analysis of RIDITS By Louise A. Francis Francis Analytics and Actuarial Data Mining, Inc.
Multivariate Analysis and Data Reduction. Multivariate Analysis Multivariate analysis tries to find patterns and relationships among multiple dependent.
Neural Networks Demystified by Louise Francis Francis Analytics and Actuarial Data Mining, Inc.
Module III Multivariate Analysis Techniques- Framework, Factor Analysis, Cluster Analysis and Conjoint Analysis Research Report.
Education 795 Class Notes Factor Analysis Note set 6.
Principle Component Analysis and its use in MA clustering Lecture 12.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Advanced Statistics Factor Analysis, I. Introduction Factor analysis is a statistical technique about the relation between: (a)observed variables (X i.
Factor Analysis Basics. Why Factor? Combine similar variables into more meaningful factors. Reduce the number of variables dramatically while retaining.
Principal Component Analysis
Monday, February 22,  The term analytics is often used interchangeably with:  Data science  Data mining  Knowledge discovery  Extracting useful.
Chapter Seventeen Copyright © 2004 John Wiley & Sons, Inc. Multivariate Data Analysis.
Data Mining: Neural Network Applications by Louise Francis CAS Convention, Nov 13, 2001 Francis Analytics and Actuarial Data Mining, Inc.
FACTOR ANALYSIS.  The basic objective of Factor Analysis is data reduction or structure detection.  The purpose of data reduction is to remove redundant.
Multivariate statistical methods. Multivariate methods multivariate dataset – group of n objects, m variables (as a rule n>m, if possible). confirmation.
Chapter 14 EXPLORATORY FACTOR ANALYSIS. Exploratory Factor Analysis  Statistical technique for dealing with multiple variables  Many variables are reduced.
McGraw-Hill/Irwin © 2003 The McGraw-Hill Companies, Inc.,All Rights Reserved. Part Four ANALYSIS AND PRESENTATION OF DATA.
Methods of multivariate analysis Ing. Jozef Palkovič, PhD.
Chapter 12 REGRESSION DIAGNOSTICS AND CANONICAL CORRELATION.
CSE 4705 Artificial Intelligence
PREDICT 422: Practical Machine Learning
Data Mining CAS 2004 Ratemaking Seminar Philadelphia, Pa.
Dimension Reduction in Workers Compensation
Factor analysis Advanced Quantitative Research Methods
Essentials of Modern Business Statistics (7e)
Principal Component Analysis (PCA)
Dimension Reduction via PCA (Principal Component Analysis)
Descriptive Statistics vs. Factor Analysis
Measuring latent variables
Multivariate Statistics
Introduction to Statistical Methods for Measuring “Omics” and Field Data PCA, PcoA, distance measure, AMOVA.
Dimension reduction : PCA and Clustering
Principal Component Analysis
Chapter_19 Factor Analysis
Multidimensional Space,
Factor Analysis (Principal Components) Output
Measuring latent variables
Presentation transcript:

Dimension Reduction in Workers Compensation CAS predictive Modeling Seminar Louise Francis, FCAS, MAAA Francis Analytics and Actuarial Data Mining, Inc.

Objectives Answer questions: What is dimension reduction and why use it? Introduce key methods of dimension reduction Illustrate with examples in Workers Compensation There will be some formulas, but emphasis is on insight into basic mechanisms of the procedures

Introduction “How do mere observations become data for analysis?” “Specific variable values are never immutable characteristics of the data” Jacoby, Data Theory and Dimension Analysis, Sage Publications Many of the dimension reduction/measurement techniques originated in the social sciences and dealt with how to create scales from responses on attitudinal and opinion surveys

Unsupervised learning Dimension reduction methods generally unsupervised learning Supervised Learning A dependent or target variable Unsupervised learning No target variable Group like variables or like records together

The Data BLS Economic indexes Components of inflation Employment data Health insurance inflation Texas Department of Insurance closed claim data for 2002 and 2003 Employment related injury Excludes small claims About 1800 records

What is a dimension? Jacoby – The number of separate and interesting sources of variation In many studies each variable is a dimension However, we can also view each record in a database as a dimension

Dimensions

The Two Major Categories of Dimension Reduction Variable reduction Factor Analysis Principal Components Analysis Record reduction Clustering Other methods tend to be developments on these

Principal Components Analysis A form of dimension (variable) reduction Suppose we want to combine all the information related to the “inflation” dimension of insurance costs Medical care costs Employment (wage) costs Other Energy Transportation Services

Principal Components These variables are correlated but not perfectly correlated We replace many variables with a weighted sum of the variables These are then used as independent variables in a predictive model

Factor Analysis: A Latent Factor

Factor/Principal Components Analysis Linear methods – use linear correlation matrix Correlation matrix decomposed to find smaller number of factors the are related to the same underlying drivers Highly correlated variables tend to have high load on the same factor

Factor/Principal Components Analysis

Uses eignevectors and eigenvalues R is correlation matrix, V eigenvectors, lambda eigenvalues

Inflation Data

Factor Rotation Find simpler more easily interpretable factors Use notion of factor complexity

Factor Rotation Quartimax Rotation Maximize q Varimax Rotation Maximizes the variance of squared loadings for each factor rather than for each variable

Varimax Rotation

Plot of Loadings on Factors

How Many Factors to Keep? Eigenvalues provide information on how much variance is explained Proportion explained by a given component=corresponding eigenvalue/n Use Scree Plot Rule of thumb: keep all factors with eigenvalues>1

WC Severity vs Factor 1

WC Severity vs Factor 2

What About Categorical Data? Factor analysis is performed on numeric data You could code data as binary dummy variables Categorical Variables from Texas data Injury Cause of loss Business Class Health Insurance (Y/N)

Optimal Scaling A method of dealing with categorical variables Can be used to model nonlinear relationships Uses regression to Assign numbers to categories Fit regression coefficients Y*=f(X*) In each round of fitting, a new Y* and X* is created

Variable Correlations

Visualizations of Scaled Variables

Can we use scaled variables in prediction?

Tree Using Optimal Scaling Scores

Tree for Subrogation

Row Reduction: Cluster Analysis Records are grouped in categories that have similar values on the variables Examples Marketing: People with similar values on demographic variables (i.e., age, gender, income) may be grouped together for marketing Text analysis: Use words that tend to occur together to classify documents Fraud modeling Territory definition Note: no dependent variable used in analysis

Clustering Common Method: k-means, hierarchical No dependent variable – records are grouped into classes with similar values on the variable Start with a measure of similarity or dissimilarity Maximize dissimilarity between members of different clusters

Dissimilarity (Distance) Measure – Continuous Variables Euclidian Distance Manhattan Distance

Binary Variables

Sample Matching Rogers and Tanimoto

Example: Texas Data Data from 2002 and 2003 closed claim database by Texas Ins Dept Only claims over a threshold included Variables used for clustering: Report Lag Settlement Lag County (ranked by how often in data) Injury Cause of Loss Business class

Results Using Only Numeric Variables Used Euclidian distance measure

Two Stage Clustering With Categorical Variables First compute dissimilarity measures Then get clusters Find optimum number of clusters

Loadings of Injuries on Cluster

Age and Cluster

County vs Cluster

Means of Financial Variables by Cluster

Tying Things Together: Multidimensional Scaling A mathematical way to connect clustering and factor analysis Data can be decomposed into key row dimensions times a diagonal weight matrix times key column dimensions

Modern dimension reduction Hidden layer in neural networks like a nonlinear principle components Projection Pursuit Regression – a nonlinear PCA Kahonen self-organizing maps – a kind of neural network that does clustering These can be understood as enhancements factor analysis or clustering

Kahonen SOM for Fraud

Recommended References Hacher, 1994, A Step-by-Step Approach for Using the SAS System for Factor Ananlysis and Structural Equation Modeling, SAS Publications Jacoby, 1991, Data Theory and Dimension Analysis, Sage Publications Kaufman and Rousseeuw,1990, Finding Groups in Data, Wiley Kim and Mueller, 1978, Factor Analysis: Statistical Methods and Practical Issues, Sage Publications

Questions?