Chapter 12 – Discriminant Analysis

Slides:

Advertisements

Similar presentations

Discrimination and Classification. Discrimination Situation: We have two or more populations  1,  2, etc (possibly p-variate normal). The populations.

Advertisements

Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition.

Introduction to Mathematical Programming MA/OR 504 Chapter 7 Machine Learning: Discriminant Analysis Neural Networks 6-1.

Brief introduction on Logistic Regression

Chapter 3 – Data Exploration and Dimension Reduction © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Chapter 5 – Evaluating Classification & Predictive Performance

Discriminant Analysis Database Marketing Instructor:Nanda Kumar.

Chapter 4 – Evaluating Classification & Predictive Performance © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel.

Chapter 8 – Logistic Regression

Chapter 7 – Classification and Regression Trees

Chapter 7 – Classification and Regression Trees

Chapter 17 Overview of Multivariate Analysis Methods

Classification and risk prediction

Chapter 7 – K-Nearest-Neighbor

Discriminant Analysis To describe multiple regression analysis and multiple discriminant analysis. Discriminant Analysis.

Statistical Methods Chichang Jou Tamkang University.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

Introduction to Data Mining Data mining is a rapidly growing field of business analytics focused on better understanding of characteristics and.

Discriminant analysis

1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.

Spreadsheet Modeling & Decision Analysis

Multiple Discriminant Analysis and Logistic Regression.

Inference for the mean vector. Univariate Inference Let x 1, x 2, …, x n denote a sample of n from the normal distribution with mean  and variance 

CHAPTER 26 Discriminant Analysis From: McCune, B. & J. B. Grace Analysis of Ecological Communities. MjM Software Design, Gleneden Beach, Oregon.

Principles of Pattern Recognition

Discriminant Function Analysis Basics Psy524 Andrew Ainsworth.

Chapter Eighteen Discriminant Analysis Chapter Outline 1) Overview 2) Basic Concept 3) Relation to Regression and ANOVA 4) Discriminant Analysis.

Discriminant Analysis

Chapter 9 – Classification and Regression Trees

Chapter 12 – Discriminant Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.

TOPICS IN BUSINESS INTELLIGENCE K-NN & Naive Bayes – GROUP 1 Isabel van der Lijke Nathan Bok Gökhan Korkmaz.

Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.

Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

Discriminant Analysis Discriminant analysis is a technique for analyzing data when the criterion or dependent variable is categorical and the predictor.

Multiple Discriminant Analysis

Chapter 14 – Cluster Analysis © Galit Shmueli and Peter Bruce 2010 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Linear Discriminant Analysis and Its Variations Abu Minhajuddin CSE 8331 Department of Statistical Science Southern Methodist University April 27, 2002.

PATTERN RECOGNITION : CLUSTERING AND CLASSIFICATION Richard Brereton

Linear Discriminant Analysis (LDA). Goal To classify observations into 2 or more groups based on k discriminant functions (Dependent variable Y is categorical.

Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Evaluating Classification Performance

Two-Group Discriminant Function Analysis. Overview You wish to predict group membership. There are only two groups. Your predictor variables are continuous.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

 Seeks to determine group membership from predictor variables ◦ Given group membership, how many people can we correctly classify?

D/RS 1013 Discriminant Analysis. Discriminant Analysis Overview n multivariate extension of the one-way ANOVA n looks at differences between 2 or more.

Chapter 8 – Naïve Bayes DM for Business Intelligence.

DISCRIMINANT ANALYSIS. Discriminant Analysis  Discriminant analysis builds a predictive model for group membership. The model is composed of a discriminant.

Data Screening. What is it? Data screening is very important to make sure you’ve met all your assumptions, outliers, and error problems. Each type of.

Return to Big Picture Main statistical goals of OODA:

Chapter 7. Classification and Prediction

Chapter 15 – Cluster Analysis

Logistic Regression APKC – STATS AFAC (2016).

CH 5: Multivariate Methods

Multiple Discriminant Analysis and Logistic Regression

Discriminant Analysis

John Loucks St. Edward’s University . SLIDES . BY.

Discrimination and Classification

Chapter 7 – K-Nearest-Neighbor

Exam #3 Review Zuyin (Alvin) Zheng.

Chapter 6: Multiple Linear Regression

MIS2502: Data Analytics Classification using Decision Trees

Generally Discriminant Analysis

A graphical explanation

Multivariate Methods Berlin Chen

Feature Selection Methods

Multivariate Methods Berlin Chen, 2005 References:

MIS2502: Data Analytics Classification Using Decision Trees

Presentation transcript:

Chapter 12 – Discriminant Analysis Data Mining for Business Analytics Shmueli, Patel & Bruce

Discriminant Analysis: Background A classical statistical technique Used for classification long before data mining Classifying organisms into species Classifying skulls Fingerprint analysis And also used for business data mining (loans, customer types, etc.) Can also be used to highlight aspects that distinguish classes (profiling)

Small Example: Riding Mowers Goal: classify purchase behavior (buy/no-buy) of riding mowers based on income and lot size Outcome: owner or non-owner (0/1) Predictors: lot size, income

Can we manually draw a line that separates owners from non-owners?

Example: Loan Acceptance In the prior small example, separation is clear. In data mining applications, there will be more records, more predictors, and less clear separation. Consider Universal Bank example with only 2 predictors: Outcome: accept/don’t accept loan Predictors: Annual income (Income) Avg. monthly credit card spending (CCAvg)

Sample of 200 customers

5000 customers

Algorithm for Discriminant Analysis

The Idea To classify a new record, measure its distance from the center of each class Then, classify the record to the closest class

Step 1: Measuring Distance Need to measure each record’s distance from the center of each class The center of a class is called a centroid The centroid is simply a vector (list) of the means of each of the predictors. This mean is computed from all the records that belong to that class.

Step 1: Measuring Distance – cont. A popular distance metric is Euclidean Distance (used with KNN). We can use it to measure the distance of a record from a class centroid: Drawbacks: Sensitive to scale, variance (can normalize to correct) Ignores correlation between variables 11

Instead, use “Statistical (Mahalanobis) Distance” transpose (convert column to row) inverse of covariance matrix S (p-dimension extension of division) For a single predictor (p=1), this reduces to a z-score When p > 1, statistical distance takes account of correlations among predictors (z-score doesn’t)

Step 2: Classification Functions The idea is to create classification score that reflects the distance from each class This is done by estimating “classification functions”, which are a function of the statistical distances. The estimation maximizes the ratio of between-class to within-class variability Fisher’s linear classification functions: one for each class. Used to compute a classification score. Classify a record to class with highest score

Classification Functions (XLMiner output) record #1: income = $60K, lot size = 18.4K Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) = 53.2 Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4)= 54.48 “Non-owner” score is higher → (mis)classify as non-owner

Classification scores for part of Mowers data

Step 3: Converting to Probabilities It is possible to convert classification scores to probabilities of belonging to a class: The probability is then compared to the cutoff value in order to classify a record Probability that record i (with predictor values x1, x2, …xp) belongs to class k

Line from model (plus ad-hoc line)

Assumptions & Caveats of Discriminant Analysis Assumes multivariate normality of predictors When this condition is met, DA is more efficient than other methods (i.e. needs less data to obtain similar accuracy) Even when it is not met, DA is robust when we have enough cases in smallest class (> 20) . This means it can be used with dummy variables! Assumes correlation among predictors within a class is the same across all classes. (Compare correlation tables of each class by eye.) Sensitive to outliers

Assessing Predictive Performance As in other classification methods: Confusion matrix Lift Based on validation data

Improving Classifications

Prior Probabilities If classes are not equally frequent, or their frequency in the sample does not reflect reality, then classification functions can be improved How? Incorporate prior (or real) probabilities of class membership: Add log(pj) to the classification function for class j Pj is probability a case belongs to class j

Example - Mowers Sample contains 50% owners, but suppose in population only 15% are owners (i.e. 0.15 probability of being an owner) Existing classification function constants Owners: -73.16 Nonowners: -51.42 Adjusted for prior probabilities: Owners: -75.06 + log(0.15) = -75.06 Nonowners: -51.42 + log(0.85) = -50.58

Unequal Misclassification Costs For the two-class (buyer/non-buyer) case, we can account for asymmetric costs of misclassification (C1, C2) in same fashion as for unequal prior probabilities How? Add log(C1) and log (C2) to constants Often absolute costs are unknown. Instead, use cost ratio: Set C1 = 1, C2 = ratio Add log (C2/C1) to class 2’s constant

Unequal Misclassification Costs Actual class Misclassification Misclassification cost Adjustment C1 Classified as C2 q1 Add log(q1) to the constant for class 1 C2 Classified as C1 q2 Add log(q2) to the constant for class 2

Multiple Classes Same procedure is used for multiple classes One classification function for each class Whichever function has highest value, case is assigned to that class

Example: Auto Accidents Outcome: (3 classes) No injury Non-fatal injury Fatal injury Predictors: Time of day, day of week, weather, type of road, road surface conditions, …

Accident Example: Data Sample

Classification Functions

Scores for first few records For row #2, “non-fatal” score is highest, so record is classified as non-fatal Next slide shows these scores plus the estimated probabilities

XLMiner output: scores & probabilities

Summary Discriminant analysis is based on measuring the distance of a record from the class centers The distance metric used is statistical distance, which takes into account the correlations between predictors Suitable for small datasets Assumptions: equal correlations within each class, and normality (but fairly robust to violation of normality) Sensitive to outliers (explore the data!) Classification functions useful for profiling: can order predictors in terms of separating the classes