Chapter 4: Predictive Modeling

Slides:

Advertisements

Similar presentations

Chapter 5 Multiple Linear Regression

Advertisements

PROJECT RISK MANAGEMENT

Linear Regression.

Brief introduction on Logistic Regression

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.

Chapter 10 Decision Making © 2013 by Nelson Education.

Week 3. Logistic Regression Overview and applications Additional issues Select Inputs Optimize complexity Transforming Inputs.

Chapter 8 – Logistic Regression

“I Don’t Need Enterprise Miner”

Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.

Chapter 6: Model Assessment

Data Mining CS 341, Spring 2007 Lecture 4: Data Mining Techniques (I)

Notes on Logistic Regression STAT 4330/8330. Introduction Previously, you learned about odds ratios (OR’s). We now transition and begin discussion of.

An Introduction to Logistic Regression

Data Analysis Statistics. Inferential statistics.

McGraw-Hill/Irwin Copyright © 2008, The McGraw-Hill Companies, Inc. All rights reserved.

Regression Model Building Setting: Possibly a large set of predictor variables (including interactions). Goal: Fit a parsimonious model that explains variation.

Chapter 5 Data mining : A Closer Look.

Decision Tree Models in Data Mining

Microsoft Enterprise Consortium Data Mining Concepts Introduction to Directed Data Mining: Decision Trees Prepared by David Douglas, University of ArkansasHosted.

Beyond Opportunity; Enterprise Miner Ronalda Koster, Data Analyst.

Determining How Costs Behave

Introduction to Directed Data Mining: Decision Trees

Application of SAS®! Enterprise Miner™ in Credit Risk Analytics

Overview DM for Business Intelligence.

Chapter 1: Introduction to Predictive Modeling 1.1 Applications 1.2 Generalization 1.3 JMP Predictive Modeling Platforms.

Chapter 1: Introduction to Statistics

Chapter 15 Correlation and Regression

Business Analysis and Essential Competencies

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Analyzing and Interpreting Quantitative Data

5.2 Input Selection 5.3 Stopped Training

Describing Process Specifications and Structured Decisions Systems Analysis and Design, 7e Kendall & Kendall 9 © 2008 Pearson Prentice Hall.

Zhangxi Lin ISQS Texas Tech University Note: Most slides are from Decision Tree Modeling by SAS Lecture Notes 5 Auxiliary Uses of Trees.

Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.

1 Business System Analysis & Decision Making – Data Mining and Web Mining Zhangxi Lin ISQS 5340 Summer II 2006.

Examining Relationships in Quantitative Research

Arben Asllani University of Tennessee at Chattanooga Prescriptive Analytics CHAPTER 8 Marketing Analytics with Linear Programming Business Analytics with.

1. 2 Traditional Income Statement LO1: Prepare a contribution margin income statement.

EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.

1 Chapter 1: Stratified Data Analysis 1.1 Introduction 1.2 Examining Associations among Variables 1.3 Recursive Partitioning 1.4 Introduction to Logistic.

Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.

Chapter 4: Introduction to Predictive Modeling: Regressions

1 Chapter 2: Logistic Regression and Correspondence Analysis 2.1 Fitting Ordinal Logistic Regression Models 2.2 Fitting Nominal Logistic Regression Models.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.

Chapter 6: Analyzing and Interpreting Quantitative Data

1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.

1 Chapter 4: Introduction to Predictive Modeling: Regressions 4.1 Introduction 4.2 Selecting Regression Inputs 4.3 Optimizing Regression Complexity 4.4.

Logistic Regression Analysis Gerrit Rooks

Chapter 5: Credibility. Introduction Performance on the training set is not a good indicator of performance on an independent set. We need to predict.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

EXCEL DECISION MAKING TOOLS AND CHARTS BASIC FORMULAE - REGRESSION - GOAL SEEK - SOLVER.

2011 Data Mining Industrial & Information Systems Engineering Pilsung Kang Industrial & Information Systems Engineering Seoul National University of Science.

Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.

Determining How Costs Behave

Cost-Volume-Profit Analysis

BINARY LOGISTIC REGRESSION

Data Transformation: Normalization

Chapter 7. Classification and Prediction

Notes on Logistic Regression

Determining How Costs Behave

Introduction to Data Mining and Classification

Advanced Analytics Using Enterprise Miner

Week 11 Knowledge Discovery Systems & Data Mining :

Introduction to Logistic Regression

An Introduction to Correlational Research

15.1 The Role of Statistics in the Research Process

Presentation transcript:

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Objectives Explain the concepts of predictive modeling. Illustrate the modeling essentials of a predictive model. Explain the importance of data partitioning.

Catalog Case Study Analysis Goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future. Data set: CATALOG2010 Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

Where You’ve Been, Where You’re Going… With basic descriptive modeling techniques (RFM), you identified customers who might be profitable. Sophisticated predictive modeling techniques can produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more. Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.

Descriptive Modeling Tells You about Now Descriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past. Past Behavior Fact-Based Reports Current State of the Customer

From Descriptive to Predictive Modeling Predictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future. Strategy Past Behavior Fact-Based Predictions

Predictive Modeling Terminology inputs target Training Data Set The variables are called inputs and targets. The observations in a training data set are known as training cases.

Predictive Model Training Data Set inputs target Predictive model: a concise representation of the input and target association

Predictive Model inputs predictions Predictions: output of the predictive model given a set of input measurements

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity.

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity.

Three Prediction Types inputs prediction decisions rankings estimates

Decision Predictions A predictive model uses input measurements inputs prediction primary A predictive model uses input measurements to make the best decision for each case. secondary tertiary primary secondary

Ranking Predictions A predictive model uses input measurements inputs prediction 720 A predictive model uses input measurements to optimally rank each case. 520 580 470 630

Estimate Predictions A predictive model uses input measurements inputs prediction 0.65 A predictive model uses input measurements to optimally estimate the target value. 0.33 0.54 0.28 0.75

Idea Exchange Think of two or three business problems that would require each of the three types of prediction. What would require a decision? How would you obtain information to help you in making a decision based on a model score? What would require a ranking? How would you use this ranking information? What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?

Modeling Essentials – Predict Review Decide, rank, and estimate. Determine type of prediction. Select useful inputs. Optimize complexity.

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity.

Input Reduction Strategies Redundancy x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3

Input Reduction – Redundancy x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3 Input x2 has the same information as input x1. Example: x1 is household income and x2 is home value.

Input Reduction – Irrelevancy Redundancy x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3 Predictions change with input x4 but much less with input x3. Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.

Modeling Essentials – Select Review Decide, rank, and estimate. Determine type of prediction. Eradicate redundancies and irrelevancies. Select useful inputs. Optimize complexity.

Modeling Essentials Determine type of prediction. Select useful inputs. Optimize complexity. Optimize complexity

Data Partitioning Training Data Validation Data inputs target inputs target Partition available data into training and validation sets. The model is fit on the training data set, and model performance is evaluated on the validation data set.

Predictive Model Sequence Training Data Validation Data inputs target inputs target 1 2 Create a sequence of models with increasing complexity. 3 4 5 Model Complexity

Model Performance Assessment Training Data Validation Data inputs target inputs target 1 2 Rate model performance using validation data. 3 4 5 Model Complexity Validation Assessment

Model Selection Training Data Validation Data inputs target inputs target 1 2 Select the simplest model with the highest validation assessment. 3 4 5 Model Complexity Validation Assessment

4.01 Multiple Choice Poll The best model is the simplest model with the best performance on the training data. simplest model with the best performance on the validation data. most complex model with the best performance on the training data. most complex model with the best performance on the validation data. B

4.01 Multiple Choice Poll – Correct Answer The best model is the simplest model with the best performance on the training data. simplest model with the best performance on the validation data. most complex model with the best performance on the training data. most complex model with the best performance on the validation data. B

Modeling Essentials – Optimize Review Decide, rank, and estimate. Determine type of prediction. Eradicate redundancies and irrelevancies. Select useful inputs. Tune models with validation data. Optimize complexity.

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Objectives Explain the concept of decision trees. Illustrate the modeling essentials of decision trees. Construct a decision tree predictive model in SAS Enterprise Miner.

Modeling Essentials – Decision Trees Prediction rules Determine type of prediction. Split search Select useful inputs. Pruning Optimize complexity.

Simple Prediction Illustration Training Data 1.0 0.9 Predict dot color for each x1 and x2. 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Prediction Rules root node 1.0 40% 60% 55% 70% x1 <0.52 ≥0.52 <0.51 ≥0.51 x2 <0.63 ≥0.63 0.9 0.8 0.7 interior node 0.6 x2 0.5 0.4 0.3 0.2 0.1 leaf node 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Prediction Rules root node 1.0 x2 40% 60% 55% x1 <0.52 ≥0.52 <0.63 0.9 ≥0.63 0.8 0.7 interior node x1 0.6 <0.51 ≥0.51 x2 0.5 0.4 0.3 0.2 70% 0.1 leaf node 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Prediction Rules Estimate = 0.70 Predict: 1.0 x2 ≥0.51 40% 60% 55% x1 <0.52 ≥0.52 <0.63 40% 60% 55% x1 <0.52 ≥0.52 ≥0.51 <0.63 0.9 ≥0.63 0.8 0.7 x1 0.6 <0.51 x2 0.5 0.4 0.3 0.2 70% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Modeling Essentials – Decision Trees Prediction rules Determine type of prediction. Split search Split search Select useful inputs. Select useful inputs Pruning Optimize complexity.

Decision Tree Split Search left right 1.0 0.9 0.8 0.7 0.6 Classification Matrix x2 0.5 0.4 Calculate the logworth of every partition on input x1. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 0.52 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 x2 0.5 0.4 Select the partition with the maximum logworth. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 x2 0.5 0.4 0.3 Repeat for input x2. 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.63 0.6 x2 0.5 bottom top 0.4 54% 35% max logworth(x2) 4.92 0.3 0.2 46% 65% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 Compare partition logworth ratings. x2 0.5 bottom top 0.4 54% 35% max logworth(x2) 4.92 0.3 0.2 46% 65% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 x2 <0.63 ≥0.63 0.8 0.7 0.63 0.6 x2 0.5 0.4 Create a partition rule from the best partition across all inputs. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 x2 <0.63 ≥0.63 0.8 0.7 0.6 x2 0.5 0.4 Repeat the process in each subset. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.52 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.02 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.52 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 x2 x1 <0.63 ≥0.63 <0.52 ≥0.52 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 Create a second partition rule. 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Decision Tree Split Search 1.0 0.9 0.8 0.7 0.6 Repeat to form a maximal tree. x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

4.02 Poll The maximal tree is usually the tree that you use to score new data.  Yes  No No

4.02 Poll – Correct Answer The maximal tree is usually the tree that you use to score new data.  Yes  No No

Modeling Essentials – Decision Trees Prediction rules Determine type of prediction. Split search Select useful inputs. Pruning Optimize complexity Optimize complexity.

Predictive Model Sequence Training Data Validation Data inputs target inputs target 1 2 Create a sequence of models with increasing complexity. 3 4 5 6 Model Complexity

The Maximal Tree Training Data Validation Data inputs target inputs target 1 Maximal Tree 2 Create a sequence of models with increasing complexity. 3 4 A maximal tree is the most complex model in the sequence. 5 6 Model Complexity

The Maximal Tree Training Data Validation Data inputs target inputs target 1 2 3 4 A maximal tree is the most complex model in the sequence. 5 Model Complexity

Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 The next model in the sequence is formed by pruning one split from the maximal tree. Model Complexity

Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 Each subtree’s predictive performance is rated on validation data. Model Complexity

Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 The subtree with the highest validation assessment is selected. Model Complexity

Pruning Two Splits Similarly, this is done for subsequent models. Training Data Validation Data inputs target inputs target 1 2 3 Similarly, this is done for subsequent models. 4 Model Complexity

Pruning Two Splits Prune two splits from the maximal tree,… Training Data Validation Data inputs target inputs target 1 2 3 Prune two splits from the maximal tree,… 4 Model Complexity continued...

Pruning Two Splits Training Data Validation Data inputs target inputs target 1 2 3 …rate each subtree using validation assessment, and… 4 Model Complexity continued...

Pruning Two Splits Training Data Validation Data inputs target inputs target 1 2 3 …select the subtree with the best assessment rating. 4 Model Complexity

Subsequent Pruning Continue pruning until all subtrees are considered. Training Data Validation Data inputs target inputs target Continue pruning until all subtrees are considered. 4 Model Complexity

Selecting the Best Tree Training Data Validation Data inputs target inputs target Compare validation assessment between tree complexities. Model Complexity Validation Assessment

Validation Assessment Training Data Validation Data inputs target inputs target Choose the simplest model with highest validation assessment. Model Complexity Validation Assessment

Validation Assessment Training Data Validation Data inputs target inputs target What are appropriate validation assessment ratings?

Assessment Statistics Validation Data Ratings depend on… inputs target target measurement (binary, continuous, and so on) prediction type (decisions, rankings, estimates)

Binary Targets inputs target primary outcome 1 secondary outcome 1 1

Binary Target Predictions inputs target prediction 1 primary decisions secondary 520 rankings 1 720 estimates 1 0.249

Decision Optimization inputs target prediction 1 primary decisions secondary 520 1 720 1 0.249

Decision Optimization – Accuracy inputs target prediction 1 primary true positive secondary true negative 520 Maximize accuracy: agreement between outcome and prediction 1 720 1 0.249

Decision Optimization – Misclassification inputs target prediction 1 primary secondary false negative primary secondary false positive 520 Minimize misclassification: disagreement between outcome and prediction 1 720 1 0.249

Ranking Optimization decisions rankings estimates 520 1 520 1 1 720 inputs target prediction 1 secondary decisions primary 520 520 rankings 1 1 720 720 1 estimates 0.249

Ranking Optimization – Concordance inputs target prediction 1 secondary primary target=0→low score target=1→high score 520 520 1 1 720 720 1 0.249 Maximize concordance: proper ordering of primary and secondary outcomes

Ranking Optimization – Discordance inputs target prediction 1 secondary primary target=0→high score target=1→low score 720 720 1 1 520 520 1 0.249 Minimize discordance: improper ordering of primary and secondary outcomes

Estimate Optimization inputs target prediction 1 secondary decisions primary 720 rankings 1 520 estimates 1 1 0.249 0.249

Estimate Optimization – Squared Error inputs target prediction 1 secondary primary 720 1 520 1 1 0.249 0.249 (target – estimate)2 Minimize squared error: squared difference between target and prediction

Complexity Optimization – Summary inputs target prediction decisions 1 secondary accuracy / misclassification primary rankings 720 concordance / discordance 1 520 estimates 1 0.249 squared error

4.03 Quiz What are some target variables that you might encounter that would require optimizing on… accuracy/misclassification? concordance/discordance? average squared error? Have students text or come over audio line

Statistical Graphs ROC Curves Gains and Lift Charts

Decision Matrix True Negative False Positive Actual Predicted Predicted Class Actual Class 1

Sensitivity True Positive Predicted Actual Predicted Class Actual Class 1

Positive Predicted Value True Positive Predicted Actual Predicted Class Actual Class 1

Specificity True Negative Actual Predicted Predicted Class Actual Class 1

Negative Predicted Values True Negative Actual Predicted Predicted Class Actual Class 1

ROC Curve

Gains Chart

Catalog Case Study: Steps to Build a Decision Tree Add the CATALOG2010 data source to the diagram. Use the Data Partition node to split the data into training and validation data sets. Use the Decision Tree node to select useful inputs. Use the Model Comparison node to generate model assessment statistics and plots.

Constructing a Decision Tree Predictive Model Catalog Case Study Task: Construct a decision tree model.

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Objectives Explain the concepts of logistic regression. Discuss modeling strategies for building a predictive model. Fit a predictive logistic regression model in SAS Enterprise Miner.

Modeling Essentials – Regressions Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

Modeling Essentials – Regressions Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

Simple Linear Regression Model Regression Best Fit Line

Linear Regression Prediction Formula input measurement ^ ^ ^ ^ y = β0 + β1 x1 + β2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize: ∑( yi – yi )2 training data ^ squared error function

Binary Target Linear regression does not work, because whatever the form of the equation, the results are generally unbounded. Instead, you work with the probability p that the event will occur rather than a direct classification.

Odds Instead of Probability Consider the probability p of an event (such as a horse losing a race) occurring. The probability of the event not occurring is 1-p. The odds of the event happening are p:(1-p), although you more commonly express this as integers, such as a 19-to-1 long shot at the race track. The ratio19:1 means that the horse has one chance of winning for 19 chances of losing, or the probability of winning is 1/(19+1) = 5%.

Properties of Odds and Log Odds Odds is not symmetric, varying from 0 to infinity. Odds is 1 when the probability is 50%. Log Odds is symmetric, going from minus infinity to positive infinity, like a line. Log Odds is 0 when the probability is 50%. It is highly negative for low probabilities and highly positive for high probabilities.

Logistic Regression Prediction Formula ^ log p 1 – p ( ) ^ ^ ^ = β0 + β1 x1 + β2 x2 logit scores

( ) Logit Link Function p 1 – p log = β0 + β1 x1 + β2 x2 ^ ^ ^ ^ logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

( ) Logit Link Function p 1 – p log = β0 + β1 x1 + β2 x2 = logit( p ) ^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = β0 + β1 x1 + β2 x2 = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^

4.04 Poll Linear regression on a binary target is a problem because predictions can range outside of 0 and 1.  Yes  No Yes

4.04 Poll – Correct Answer Linear regression on a binary target is a problem because predictions can range outside of 0 and 1.  Yes  No Yes

Simple Prediction Illustration – Regressions Predict dot color for each x1 and x2. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 ^ ^ ^ logit( p ) ^ = β0 + β1 x1 + β2 x2 Need intercept and parameter estimates. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 ^ ^ ^ logit( p ) ^ = β0 + β1 x1 + β2 x2 Find parameter estimates by maximizing. log-likelihood function 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

Simple Prediction Illustration – Regressions 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2. 0.50 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Regressions: Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Missing Values and Regression Modeling Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

Missing Values and Regression Modeling Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs

Missing Values and the Prediction Formula Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values.

Missing Values and the Prediction Formula Problem 2: Prediction formulas cannot score cases with missing values.

Missing Value Issues Manage missing values. Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values.

Missing Value Causes Manage missing values. Non-applicable measurement No match on merge Non-disclosed measurement

Missing Value Remedies Manage missing values. Synthetic distribution Non-applicable measurement No match on merge Estimation xi = f(x1, … ,xp) Non-disclosed measurement

4.05 Poll Observations with missing values should always be deleted from scoring because a predicted value cannot be determined.  Yes  No No- you can impute a missing value to get predictions.

4.05 Poll – Correct Answer Observations with missing values should always be deleted from scoring because a predicted value cannot be determined.  Yes  No No- you can impute a missing value to get predictions.

Modeling Essentials – Regressions Prediction formula Determine type of predictions. Sequential selection Variable clustering and selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

Variable Redundancy

Variable Clustering X1 X3 X4 X6 X8 X9 X10 X1 X2 X3 X4 X5 X6 X7 X8 X9 Inputs are selected by cluster representation expert opinion target correlation.

Selection by 1 – R2 Ratio X2 Own Cluster 1-R 2own cluster 1 – 0.90 1 – 0.01 = = 0.101 1-R 2next closest R2 = 0.90 Next Closest R2 = 0.01

Modeling Essentials – Regressions Prediction formula Determine type of prediction. Sequential selection Variable clustering and selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Forward Input p-value Entry Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Backward Input p-value Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

Sequential Selection – Stepwise Input p-value Entry Cutoff Stay Cutoff

4.06 Poll Different model selection methods often result in different candidate models. No one method is uniformly the best.  Yes  No Yes

4.06 Poll – Correct Answer Different model selection methods often result in different candidate models. No one method is uniformly the best.  Yes  No Yes

Modeling Essentials – Regressions Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

Model Fit versus Complexity Model fit statistic 2 6 Evaluate each sequence step. 3 5 validation 4 training 1

Select Model with Optimal Validation Fit Model fit statistic Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6

Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Interpretation Unit change in x2  logit(p) p x2 x2 x1 x1 2 change in logit 100(exp(2)-1)% change in the odds

Odds Ratio from a Logistic Regression Model Estimated logistic regression model: logit(p) = .7567 + .4373*(gender) Estimated odds ratio (Females to Males): odds ratio = (e-.7567+.4373)/(e-.7567) = 1.55 An odds ratio of 1.55 means that females have 1.55 times the odds of having the outcome compared to males.

Properties of the Odds Ratio No Association Group in denominator has higher odds of the event. Group in numerator has higher odds of the event. 0 1

Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Extreme Distributions and Regressions Original Input Scale skewed input distribution high leverage points

Extreme Distributions and Regressions Original Input Scale true association true association skewed input distribution high leverage points

Extreme Distributions and Regressions Original Input Scale true association standard regression standard regression true association skewed input distribution high leverage points

Extreme Distributions and Regressions Original Input Scale Regularized Scale true association standard regression standard regression true association skewed input distribution high leverage points more symmetric distribution

Regularizing Input Transformations Original Input Scale Original Input Scale Regularized Scale standard regression skewed input distribution high leverage points more symmetric distribution

Regularizing Input Transformations Original Input Scale Regularized Scale true association standard regression regularized estimate standard regression regularized estimate true association

Idea Exchange What are examples of variables with unusual distributions that could produce problems in a regression model? Would you transform these variables? If so, what types of transformations would you entertain?

Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Nonnumeric Input Coding Two-level variable: Coding redundancy: Level DA DB Level DA DB 1 0 A 1 0 A 0 1 B 0 1 B

Nonnumeric Input Coding: Many Levels DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1

Coding Redundancy: Many Levels DA DB DC DD DE DF DG DH DI DI 1 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1

Coding Consolidation Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 A 0 1 0 0 0 0 0 0 B 0 0 1 0 0 0 0 0 C 0 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 0 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 0 1 H 0 0 0 0 0 0 0 0 I 1

Coding Consolidation Level DABCD DB DC DD DEF DF DGH DH DI 1 0 0 0 0 0 0 0 A 1 1 0 0 0 0 0 0 B 1 0 1 0 0 0 0 0 C 1 0 0 1 0 0 0 0 D 0 0 0 0 1 0 0 0 E 0 0 0 0 1 1 0 0 F 0 0 0 0 0 0 1 0 G 0 0 0 0 0 0 1 1 H 0 0 0 0 0 0 0 0 I 1

Beyond the Prediction Formula Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

Standard Logistic Regression 0.40 0.50 0.60 0.70 ( p 1 – p ^ ) 1.0 log = w0 + w1 x1 + w2 x2 ^ ^ ^ · · ^ 0.9 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Polynomial Logistic Regression 0.40 0.50 0.60 0.70 0.30 0.80 0.40 0.50 0.60 0.70 ( p 1 – p ^ ) 1.0 log = w0 + w1 x1 + w2 x2 ^ ^ ^ · · ^ 0.9 0.8 quadratic terms + w3 x1 + w4 x2 2 ^ + w5 x1 x2 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

Idea Exchange What are some predictors that you can think of that would have a nonlinear relationship with a target? What do you think the functional form of the relationship is (for example, quadratic, exponential, …)?

Catalog Case Study Analysis Goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future. Data set: CATALOG2010 Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

Fitting a Logistic Regression Model Catalog Case Study Task: Build a logistic regression model in SAS Enterprise Miner.

Catalog Case Study: Steps to Build a Logistic Regression Model Add the CATALOG2010 data source to the diagram. Use the Data Partition node to split the data into training and validation data sets. Use the Variable Clustering node to select relatively independent inputs. Use the Regression node to select relevant inputs. Use the Model Comparison node to generate model assessment statistics and plots.  In the previous example, you performed steps 1 and 2.

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Objectives Formulate an objective for predictive churn in a telecommunications example. Generate predictive models in SAS Enterprise Miner to predict churn. Score a customer database to target who is most likely to churn.

Telecommunications Company Mobile (prepaid and postpaid) and fixed service provider. In recent years, a high percentage of high revenue subscribers have churned. Company wants to target subscribers with a high churn probability for its customer retention program.

Churn Score A churn propensity score measures the propensity for an active customer to churn. The score enables marketing managers to take proactive steps to retain targeted customers before churn occurs. Churn scores are derived from analysis of the historical behavior of churned customers and existing customers who have not churned.

Possible Predictor Variables Outstanding bill value Outstanding balance period Number of calls Call duration (international, local, national calls) Period as customer Total dropped calls Total failed calls

Model Implementation inputs predictions Predictions might be added to a data source inside or outside of SAS Enterprise Miner.

Churn Case Study Examine the CHURN_TELECOM data set and add it to a diagram. Partition the data in training and validation data sets. Perform missing value imputation. Recode nominal variables to combine class levels. Reduce redundancy with variable clustering. Reduce irrelevant inputs with a decision tree and a logistic regression. Compare results and select the final model based on validation error. Score a data set to generate the list of churn risk customers.

Analyzing Churn Data Churn Case Study Task: Analyze churn data.

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Objectives Discuss the movement of analytics from the “back office” to the executive level and the reasons for these changes. Describe the three-way pull for model management. Explain why models must be maintained and reassessed over time.

Model Management and Business Analytics Model management is the assessment, deployment, and continued modification of models. This is a critical business process. Demonstrate that the model is well developed. Verify that the model is working well. Perform outcomes analysis. Model management requires a collaborative effort across the company: VP Decision Analysis and Support Group, Senior Modeling Analyst, Enterprise Architect, Internal Validation Compliance Analyst, Database Administrator Many companies now recognize the importance of implementing sound model management practices recognizing it is not a one time activity but truly an ongoing critical business process: Key elements include: Model is well developed – models should be logical on a priori basis. Models need to be developed using well defined and generally accepted techniques and methods. Models must also generalize well when applied to new data – be robust. Given a well developed model, the second element of this process is to confirm the model is implemented as designed. This entails checking the scoring (computer) code that deploys the model. Equally important, process verification includes checking the quality of the data (are the correct inputs available and properly defined); the model is properly documented; model predictions are compared with other sources of data and other challenger models. The third element is outcomes analysis in which where practical model predictions are compared to actual outcomes. Changing market and operating conditions dictate is an ongoing iterative process. Implementation of these model management key elements requires a collaborative effort. Creates and institutionalizes processes and procedures for model development and implementation. Directs model design, development and implementation consistent with both internal and external policies and procedures. Defines model performance reports and monitors model performance on an ongoing basis. Performs sample selection and design. Carries out statistical data discovery and transformations enrichment. Develops candidate predictive and descriptive models. Prepares model performance reports and monitors model performance on an ongoing basis. Recalibrates or refines models that do not meet model performance targets. Defines system and data architectures required to support business processes. Collects and prepares data sources for model development and deployment. Validates score code execution. Deploy models into operational scoring systems (call centers, credit origination systems, campaign management, etc.) Performs independent model validation review to ensure models perform correctly and comply with model specifications

Analytical Model Management Challenges Largely Manual Processes Moving to Production Proliferation of Data and Models Actionable Inferences Must pay attention to government regulations Integrating with Operational Systems Increased Regulation Sarbanes-Oxley, Basel II

Three-Way Pull for Model Management Business Value Governance Process Production Process [dh] Adrian earlier touched on the challenges of a growing model portfolio, manual model deployment, tighter integration with operational systems, and increased regulatory oversight. To overcome these challenges, there are three influences that pull for a model management process. First, it is critical to understand the business value of your analytical models. How can we be more effective in managing the portfolio of models? Secondly, you must have a specific process to move analytical models from development to the production environment. How can we ensure that models are deployed efficiently without impacting production systems? Finally, you must document the actions taken throughout the analytical model management process. Who is being held accountable for the success of the models? Are we providing appropriate documentation for regulatory compliance? [Next Slide]

Three-Way Pull for Model Management Business Value Deployment of the “best” models Consistent model development and validation Understanding of model strategy and lifetime value Production Process Efficient deployment of models in a timely manner Effective deployment to minimize operational risk Governance Process Audit trails for compliance purposes Justification for management and shareholders [dh] At the core of analytical model development is to understand the business value the models bring to the organization. Models must be evaluated as part of the enterprise, and in context with other analytical models already in place or being evaluated as champions. Consistency of development extends across the entire model development cycle– from data staging through model generation through model consumption. That consistency provides a common method for measuring value, allowing apples-to-apples comparisons of results. And, over time, all analytical models will tend to lose their value – behaviour patterns will shift, environmental factors will change. So, it’s also important to understand the value of retiring models at the appropriate time, before they have a negative impact on the business. Once an analytical model has been generated, the next step is to deploy the model into the production environment. Models do not have an infinite lifetime. Speed to market is definitely a factor. Clearly defined processes lend themselves to efficiency. Handoffs between different departments or personnel should be clearly defined and understood. By having repeatable, controlled, and documented processes, models can be deployed in the most efficient and effective manner. Governance involves both external and internal requirements for accountability. This includes enforcing ownership of the dev/test/production process, supporting version control, and providing documentation for each step of the process. Governance also includes being accountable to corporate management and shareholders for the results that are ultimately generated. [Next Slide]

Changes in the Analytical Landscape Now… STAKEHOLDERS Customer Service Retail Logistics Promotions OPERATIONS TARGET Management Customers Analytical Modelers IT Ops Data Integrators Business Governance Suppliers [dh] To deal with all these demands on model management, the number of people involved in the model management process expands outside the strict domain of the analytical developer. It now involves folks from across the organization – business users, IT operations, data architects, and others. So, instead of individual modelers managing their own process, an enterprise process is in place where the different steps in model deployment now need to be handled by multiple individuals in multiple departments. Collaboration between modelers and departments now becomes a key factor in successful implementations of analytical models into the production environment. [Next Slide] Employees Stockholders

Model Management As models proliferate, you need: To be more diligent, but… There is not an established process to handle model deployment into production. Model deployment is inefficient. More individuals and groups in the organization must be involved in the process. To be more vigilant, but… It is difficult to effectively manage existing models and track the model life cycle. It is difficult to consistently provide appropriate internal and regulatory documentation. [dh] The goals for model management are pretty straightforward. Organizations would like to be more diligent about how models are deployed, and they want to be more vigilant in understanding the performance of their models. However, they are struggling with inefficient deployment mechanisms and have been ineffective in managing their model portfolio. Models do not have a infinite lifetime. They are built upon a snapshot of data. The longer it takes to deploy a model into production, the less value that model can generate. There must be a process in place to get models more quickly into production. All stakeholders involved with implementing the model need to be part of the process. Once deployed into the production environment, organizations need support to remain vigilant about ongoing performance. Often times, a model is deployed, then “forgotten”. The model continues to run over time, but it is based on old assumptions and outdated behavior patterns. There is no mechanism in place to warn about degrading performance. [Next Slide]

Idea Exchange How can you implement model management in your organization? Do you already have systems in place for continuous improvement and monitoring of models? For audit trails and compliance checks? Describe briefly how they operate.

Lessons Learned Model management is a key part of good business analytics. Models should be evaluated before, during, and after deployment. New models replace old ones as dictated by the data over time. Data mining comes in two forms. Directed data mining is searching through historical records to find patterns that explain a particular outcome. Directed data mining includes the tasks of classification, estimation, prediction, and profiling. Undirected data mining is searching through the same records for interesting patterns. It includes the tasks of cluster, finding association rules, and description. The primary lesson of this chapter is that data mining is full of traps for the unwary, and following a data mining methodology based on experience can help avoid them. The first hurdle is translating the business problem into one of the six tasks that can be solved by data mining: classification, estimation, prediction, affinity grouping, clustering, and profiling. The next challenge is to locate appropriate data that can be transformed into actionable information. Once the data has been located, it should be explored thoroughly. The exploration process is likely to reveal problems with the data. It will also help build up the data miner’s intuitive understanding of the data. The next step is to create a model set and partition it into training, validation, and test sets. Data transformations are necessary for two purposes: to fix problems with the data such as missing values and categorical variables that take on too many values, and to bring information to the surface by creating new variables to represent trends and other ratios and combinations. Once the data has been prepared, building models is a relatively easy process. Each type of model has its own metrics by which it can be assessed, but there are also assessment tools that are independent of the type of model. Some of the most important of these are the lift chart, which shows how the model has increased the concentration of the desired value of the target variable, and the confusion matrix, which shows the misclassification error rate for each of the target classes.

Chapter 4: Predictive Modeling 4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

Recommended Reading Davenport, Thomas H., Jeanne G. Harris, and Robert Morison. 2010. Analytics at Work: Smarter Decisions, Better Results. Boston: Harvard Business Press. Chapters 7 and 8 Chapters 7 and 8 focus on making analytics an integral part of a business. Systems, processes, and organizational culture must work together to move toward analytical leadership. The remaining three chapters of the book (9-11) are optional, self-study material.

Recommended Reading May, Thornton. 2010. The New Know: Innovation Powered by Analytics. New York: Wiley. Chapter 1 May’s book provides a counterpoint to the Davenport, et al. book, from the perspective of the role of analysts in the organization, and how organizations can make the best use of their analytical talent.

Recommended Reading Morris, Michael. “Mining Student Data Could Save Lives.” The Chronicle of Higher Education. October 2, 2011. http://chronicle.com/article/Mining-Student-Data-Could-Save/129231/ This article discusses the mining of student data at colleges and universities to prevent large-scale acts of violence on campus. Mining of students’ data (including Internet usage and social networking data), would enhance the capacity of threat-assessment teams to protect the health and safety of the students.