Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 4: Predictive Modeling

Similar presentations


Presentation on theme: "Chapter 4: Predictive Modeling"— Presentation transcript:

1 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

2 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

3 Objectives Explain the concepts of predictive modeling.
Illustrate the modeling essentials of a predictive model. Explain the importance of data partitioning.

4 Catalog Case Study Analysis Goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future. Data set: CATALOG2010 Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

5 Where You’ve Been, Where You’re Going…
With basic descriptive modeling techniques (RFM), you identified customers who might be profitable. Sophisticated predictive modeling techniques can produce risk scores for current customers, profitable prospects from outside the customer database, cross-sell and up-sell lists, and much more. Scoring techniques based on predictive models can be implemented in real-time data collection systems, automating the process of fact-based decision making.

6 Descriptive Modeling Tells You about Now
Descriptive statistics inform you about your sample. This information is important for reacting to things that have happened in the past. Past Behavior Fact-Based Reports Current State of the Customer

7 From Descriptive to Predictive Modeling
Predictive modeling techniques, paired with scoring and good model management, enable you to use your data about the past and the present to make good decisions for the future. Strategy Past Behavior Fact-Based Predictions

8 Predictive Modeling Terminology
inputs target Training Data Set The variables are called inputs and targets. The observations in a training data set are known as training cases.

9 Predictive Model Training Data Set inputs target Predictive model: a concise representation of the input and target association

10 Predictive Model inputs predictions Predictions: output of the predictive model given a set of input measurements

11 Modeling Essentials Determine type of prediction.
Select useful inputs. Optimize complexity.

12 Modeling Essentials Determine type of prediction.
Select useful inputs. Optimize complexity.

13 Three Prediction Types
inputs prediction decisions rankings estimates

14 Decision Predictions A predictive model uses input measurements
inputs prediction primary A predictive model uses input measurements to make the best decision for each case. secondary tertiary primary secondary

15 Ranking Predictions A predictive model uses input measurements
inputs prediction 720 A predictive model uses input measurements to optimally rank each case. 520 580 470 630

16 Estimate Predictions A predictive model uses input measurements
inputs prediction 0.65 A predictive model uses input measurements to optimally estimate the target value. 0.33 0.54 0.28 0.75

17 Idea Exchange Think of two or three business problems that would require each of the three types of prediction. What would require a decision? How would you obtain information to help you in making a decision based on a model score? What would require a ranking? How would you use this ranking information? What would require an estimate? Would you estimate a continuous quantity, a count, a proportion, or some other quantity?

18 Modeling Essentials – Predict Review
Decide, rank, and estimate. Determine type of prediction. Select useful inputs. Optimize complexity.

19 Modeling Essentials Determine type of prediction.
Select useful inputs. Optimize complexity.

20 Input Reduction Strategies
Redundancy x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3

21 Input Reduction – Redundancy
x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3 Input x2 has the same information as input x1. Example: x1 is household income and x2 is home value.

22 Input Reduction – Irrelevancy
Redundancy x1 x2 Irrelevancy 0.70 0.60 0.50 0.40 x4 x3 Predictions change with input x4 but much less with input x3. Example: Target is response to direct mail solicitation, x3 is religious affiliation, and x4 is response to previous solicitations.

23 Modeling Essentials – Select Review
Decide, rank, and estimate. Determine type of prediction. Eradicate redundancies and irrelevancies. Select useful inputs. Optimize complexity.

24 Modeling Essentials Determine type of prediction.
Select useful inputs. Optimize complexity. Optimize complexity

25 Data Partitioning Training Data Validation Data inputs target inputs target Partition available data into training and validation sets. The model is fit on the training data set, and model performance is evaluated on the validation data set.

26 Predictive Model Sequence
Training Data Validation Data inputs target inputs target 1 2 Create a sequence of models with increasing complexity. 3 4 5 Model Complexity

27 Model Performance Assessment
Training Data Validation Data inputs target inputs target 1 2 Rate model performance using validation data. 3 4 5 Model Complexity Validation Assessment

28 Model Selection Training Data Validation Data inputs target inputs target 1 2 Select the simplest model with the highest validation assessment. 3 4 5 Model Complexity Validation Assessment

29 4.01 Multiple Choice Poll The best model is the
simplest model with the best performance on the training data. simplest model with the best performance on the validation data. most complex model with the best performance on the training data. most complex model with the best performance on the validation data. B

30 4.01 Multiple Choice Poll – Correct Answer
The best model is the simplest model with the best performance on the training data. simplest model with the best performance on the validation data. most complex model with the best performance on the training data. most complex model with the best performance on the validation data. B

31 Modeling Essentials – Optimize Review
Decide, rank, and estimate. Determine type of prediction. Eradicate redundancies and irrelevancies. Select useful inputs. Tune models with validation data. Optimize complexity.

32 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

33 Objectives Explain the concept of decision trees.
Illustrate the modeling essentials of decision trees. Construct a decision tree predictive model in SAS Enterprise Miner.

34 Modeling Essentials – Decision Trees
Prediction rules Determine type of prediction. Split search Select useful inputs. Pruning Optimize complexity.

35 Simple Prediction Illustration
Training Data 1.0 0.9 Predict dot color for each x1 and x2. 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

36 Decision Tree Prediction Rules
root node 1.0 40% 60% 55% 70% x1 <0.52 ≥0.52 <0.51 ≥0.51 x2 <0.63 ≥0.63 0.9 0.8 0.7 interior node 0.6 x2 0.5 0.4 0.3 0.2 0.1 leaf node 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

37 Decision Tree Prediction Rules
root node 1.0 x2 40% 60% 55% x1 <0.52 ≥0.52 <0.63 0.9 ≥0.63 0.8 0.7 interior node x1 0.6 <0.51 ≥0.51 x2 0.5 0.4 0.3 0.2 70% 0.1 leaf node 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

38 Decision Tree Prediction Rules
Estimate = 0.70 Predict: 1.0 x2 ≥0.51 40% 60% 55% x1 <0.52 ≥0.52 <0.63 40% 60% 55% x1 <0.52 ≥0.52 ≥0.51 <0.63 0.9 ≥0.63 0.8 0.7 x1 0.6 <0.51 x2 0.5 0.4 0.3 0.2 70% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

39 Modeling Essentials – Decision Trees
Prediction rules Determine type of prediction. Split search Split search Select useful inputs. Select useful inputs Pruning Optimize complexity.

40 Decision Tree Split Search
left right 1.0 0.9 0.8 0.7 0.6 Classification Matrix x2 0.5 0.4 Calculate the logworth of every partition on input x1. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

41 Decision Tree Split Search
left right 0.52 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 x2 0.5 0.4 Select the partition with the maximum logworth. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

42 Decision Tree Split Search
left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 x2 0.5 0.4 0.3 Repeat for input x2. 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

43 Decision Tree Split Search
left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.63 0.6 x2 0.5 bottom top 0.4 54% 35% max logworth(x2) 4.92 0.3 0.2 46% 65% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

44 Decision Tree Split Search
left right 53% 42% 1.0 max logworth(x1) 0.95 0.9 47% 58% 0.8 0.7 0.6 Compare partition logworth ratings. x2 0.5 bottom top 0.4 54% 35% max logworth(x2) 4.92 0.3 0.2 46% 65% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

45 Decision Tree Split Search
1.0 0.9 x2 <0.63 ≥0.63 0.8 0.7 0.63 0.6 x2 0.5 0.4 Create a partition rule from the best partition across all inputs. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

46 Decision Tree Split Search
1.0 0.9 x2 <0.63 ≥0.63 0.8 0.7 0.6 x2 0.5 0.4 Repeat the process in each subset. 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

47 Decision Tree Split Search
left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.52 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

48 Decision Tree Split Search
left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.02 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

49 Decision Tree Split Search
left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

50 Decision Tree Split Search
left right 61% 55% 1.0 max logworth(x1) 5.72 0.9 39% 45% 0.8 0.7 0.52 0.6 x2 0.5 bottom top 0.4 38% 55% max logworth(x2) -2.01 0.3 0.2 62% 45% 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

51 Decision Tree Split Search
1.0 0.9 x2 x1 <0.63 ≥0.63 <0.52 ≥0.52 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 Create a second partition rule. 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

52 Decision Tree Split Search
1.0 0.9 0.8 0.7 0.6 Repeat to form a maximal tree. x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

53 4.02 Poll The maximal tree is usually the tree that you use to score new data.  Yes  No No

54 4.02 Poll – Correct Answer The maximal tree is usually the tree that you use to score new data.  Yes  No No

55 Modeling Essentials – Decision Trees
Prediction rules Determine type of prediction. Split search Select useful inputs. Pruning Optimize complexity Optimize complexity.

56 Predictive Model Sequence
Training Data Validation Data inputs target inputs target 1 2 Create a sequence of models with increasing complexity. 3 4 5 6 Model Complexity

57 The Maximal Tree Training Data Validation Data inputs target inputs target 1 Maximal Tree 2 Create a sequence of models with increasing complexity. 3 4 A maximal tree is the most complex model in the sequence. 5 6 Model Complexity

58 The Maximal Tree Training Data Validation Data inputs target inputs target 1 2 3 4 A maximal tree is the most complex model in the sequence. 5 Model Complexity

59 Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 The next model in the sequence is formed by pruning one split from the maximal tree. Model Complexity

60 Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 Each subtree’s predictive performance is rated on validation data. Model Complexity

61 Pruning One Split Training Data Validation Data inputs target inputs target 1 2 3 4 The subtree with the highest validation assessment is selected. Model Complexity

62 Pruning Two Splits Similarly, this is done for subsequent models.
Training Data Validation Data inputs target inputs target 1 2 3 Similarly, this is done for subsequent models. 4 Model Complexity

63 Pruning Two Splits Prune two splits from the maximal tree,…
Training Data Validation Data inputs target inputs target 1 2 3 Prune two splits from the maximal tree,… 4 Model Complexity continued...

64 Pruning Two Splits Training Data Validation Data inputs target inputs target 1 2 3 …rate each subtree using validation assessment, and… 4 Model Complexity continued...

65 Pruning Two Splits Training Data Validation Data inputs target inputs target 1 2 3 …select the subtree with the best assessment rating. 4 Model Complexity

66 Subsequent Pruning Continue pruning until all subtrees are considered.
Training Data Validation Data inputs target inputs target Continue pruning until all subtrees are considered. 4 Model Complexity

67 Selecting the Best Tree
Training Data Validation Data inputs target inputs target Compare validation assessment between tree complexities. Model Complexity Validation Assessment

68 Validation Assessment
Training Data Validation Data inputs target inputs target Choose the simplest model with highest validation assessment. Model Complexity Validation Assessment

69 Validation Assessment
Training Data Validation Data inputs target inputs target What are appropriate validation assessment ratings?

70 Assessment Statistics
Validation Data Ratings depend on… inputs target target measurement (binary, continuous, and so on) prediction type (decisions, rankings, estimates)

71 Binary Targets inputs target primary outcome 1 secondary outcome 1 1

72 Binary Target Predictions
inputs target prediction 1 primary decisions secondary 520 rankings 1 720 estimates 1 0.249

73 Decision Optimization
inputs target prediction 1 primary decisions secondary 520 1 720 1 0.249

74 Decision Optimization – Accuracy
inputs target prediction 1 primary true positive secondary true negative 520 Maximize accuracy: agreement between outcome and prediction 1 720 1 0.249

75 Decision Optimization – Misclassification
inputs target prediction 1 primary secondary false negative primary secondary false positive 520 Minimize misclassification: disagreement between outcome and prediction 1 720 1 0.249

76 Ranking Optimization decisions rankings estimates 520 1 520 1 1 720
inputs target prediction 1 secondary decisions primary 520 520 rankings 1 1 720 720 1 estimates 0.249

77 Ranking Optimization – Concordance
inputs target prediction 1 secondary primary target=0→low score target=1→high score 520 520 1 1 720 720 1 0.249 Maximize concordance: proper ordering of primary and secondary outcomes

78 Ranking Optimization – Discordance
inputs target prediction 1 secondary primary target=0→high score target=1→low score 720 720 1 1 520 520 1 0.249 Minimize discordance: improper ordering of primary and secondary outcomes

79 Estimate Optimization
inputs target prediction 1 secondary decisions primary 720 rankings 1 520 estimates 1 1 0.249 0.249

80 Estimate Optimization – Squared Error
inputs target prediction 1 secondary primary 720 1 520 1 1 0.249 0.249 (target – estimate)2 Minimize squared error: squared difference between target and prediction

81 Complexity Optimization – Summary
inputs target prediction decisions 1 secondary accuracy / misclassification primary rankings 720 concordance / discordance 1 520 estimates 1 0.249 squared error

82 4.03 Quiz What are some target variables that you might encounter that would require optimizing on… accuracy/misclassification? concordance/discordance? average squared error? Have students text or come over audio line

83 Statistical Graphs ROC Curves Gains and Lift Charts

84 Decision Matrix True Negative False Positive Actual Predicted
Predicted Class Actual Class 1

85 Sensitivity True Positive Predicted Actual Predicted Class
Actual Class 1

86 Positive Predicted Value
True Positive Predicted Actual Predicted Class Actual Class 1

87 Specificity True Negative Actual Predicted Predicted Class
Actual Class 1

88 Negative Predicted Values
True Negative Actual Predicted Predicted Class Actual Class 1

89 ROC Curve

90 Gains Chart

91 Catalog Case Study: Steps to Build a Decision Tree
Add the CATALOG2010 data source to the diagram. Use the Data Partition node to split the data into training and validation data sets. Use the Decision Tree node to select useful inputs. Use the Model Comparison node to generate model assessment statistics and plots.

92 Constructing a Decision Tree Predictive Model
Catalog Case Study Task: Construct a decision tree model.

93 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

94 Objectives Explain the concepts of logistic regression.
Discuss modeling strategies for building a predictive model. Fit a predictive logistic regression model in SAS Enterprise Miner.

95 Modeling Essentials – Regressions
Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

96 Modeling Essentials – Regressions
Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

97 Simple Linear Regression Model
Regression Best Fit Line

98 Linear Regression Prediction Formula
input measurement ^ ^ ^ ^ y = β0 + β1 x1 + β2 x2 prediction estimate intercept estimate parameter estimate Choose intercept and parameter estimates to minimize: ∑( yi – yi )2 training data ^ squared error function

99 Binary Target Linear regression does not work, because whatever the form of the equation, the results are generally unbounded. Instead, you work with the probability p that the event will occur rather than a direct classification.

100 Odds Instead of Probability
Consider the probability p of an event (such as a horse losing a race) occurring. The probability of the event not occurring is 1-p. The odds of the event happening are p:(1-p), although you more commonly express this as integers, such as a 19-to-1 long shot at the race track. The ratio19:1 means that the horse has one chance of winning for 19 chances of losing, or the probability of winning is 1/(19+1) = 5%.

101 Properties of Odds and Log Odds
Odds is not symmetric, varying from 0 to infinity. Odds is 1 when the probability is 50%. Log Odds is symmetric, going from minus infinity to positive infinity, like a line. Log Odds is 0 when the probability is 50%. It is highly negative for low probabilities and highly positive for high probabilities.

102 Logistic Regression Prediction Formula
^ log p 1 – p ( ) ^ ^ ^ = β0 + β1 x1 + β2 x2 logit scores

103 ( ) Logit Link Function p 1 – p log = β0 + β1 x1 + β2 x2 ^ ^ ^ ^
logit scores logit link function 1 5 -5 The logit link function transforms probabilities (between 0 and 1) to logit scores (between −∞ and +∞).

104 ( ) Logit Link Function p 1 – p log = β0 + β1 x1 + β2 x2 = logit( p )
^ log p 1 – p ( ) ^ ^ ^ ^ logit( p ) = β0 + β1 x1 + β2 x2 = logit scores 1 1 + e-logit( p ) p = ^ To obtain prediction estimates, the logit equation is solved for p. ^

105 4.04 Poll Linear regression on a binary target is a problem because predictions can range outside of 0 and 1.  Yes  No Yes

106 4.04 Poll – Correct Answer Linear regression on a binary target is a problem because predictions can range outside of 0 and 1.  Yes  No Yes

107 Simple Prediction Illustration – Regressions
Predict dot color for each x1 and x2. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 ^ ^ ^ logit( p ) ^ = β0 + β1 x1 + β2 x2 Need intercept and parameter estimates. 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

108 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 ^ ^ ^ logit( p ) ^ = β0 + β1 x1 + β2 x2 Find parameter estimates by maximizing. log-likelihood function 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

109 Simple Prediction Illustration – Regressions
0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x2 0.70 0.60 Using the maximum likelihood estimates, the prediction formula assigns a logit score to each x1 and x2. 0.50 0.40 0.0 0.5 0.1 0.2 0.3 0.4 0.6 0.7 0.8 0.9 1.0 x1

110 Regressions: Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

111 Regressions: Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

112 Missing Values and Regression Modeling
Training Data target inputs Problem 1: Training data cases with missing values on inputs used by a regression model are ignored.

113 Missing Values and Regression Modeling
Consequence: Missing values can significantly reduce your amount of training data for regression modeling! Training Data target inputs

114 Missing Values and the Prediction Formula
Predict: (x1, x2) = (0.3, ? ) Problem 2: Prediction formulas cannot score cases with missing values.

115 Missing Values and the Prediction Formula
Problem 2: Prediction formulas cannot score cases with missing values.

116 Missing Value Issues Manage missing values.
Problem 1: Training data cases with missing values on inputs used by a regression model are ignored. Problem 2: Prediction formulas cannot score cases with missing values.

117 Missing Value Causes Manage missing values. Non-applicable measurement
No match on merge Non-disclosed measurement

118 Missing Value Remedies
Manage missing values. Synthetic distribution Non-applicable measurement No match on merge Estimation xi = f(x1, … ,xp) Non-disclosed measurement

119 4.05 Poll Observations with missing values should always be deleted from scoring because a predicted value cannot be determined.  Yes  No No- you can impute a missing value to get predictions.

120 4.05 Poll – Correct Answer Observations with missing values should always be deleted from scoring because a predicted value cannot be determined.  Yes  No No- you can impute a missing value to get predictions.

121 Modeling Essentials – Regressions
Prediction formula Determine type of predictions. Sequential selection Variable clustering and selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

122 Variable Redundancy

123 Variable Clustering X1 X3 X4 X6 X8 X9 X10 X1 X2 X3 X4 X5 X6 X7 X8 X9
Inputs are selected by cluster representation expert opinion target correlation.

124 Selection by 1 – R2 Ratio X2 Own Cluster 1-R 2own cluster 1 – 0.90
1 – 0.01 = = 0.101 1-R 2next closest R2 = 0.90 Next Closest R2 = 0.01

125 Modeling Essentials – Regressions
Prediction formula Determine type of prediction. Sequential selection Variable clustering and selection Select useful inputs. Select useful inputs Best model from sequence Optimize complexity.

126 Sequential Selection – Forward
Input p-value Entry Cutoff

127 Sequential Selection – Forward
Input p-value Entry Cutoff

128 Sequential Selection – Forward
Input p-value Entry Cutoff

129 Sequential Selection – Forward
Input p-value Entry Cutoff

130 Sequential Selection – Forward
Input p-value Entry Cutoff

131 Sequential Selection – Forward
Input p-value Entry Cutoff

132 Sequential Selection – Backward
Input p-value Stay Cutoff

133 Sequential Selection – Backward
Input p-value Stay Cutoff

134 Sequential Selection – Backward
Input p-value Stay Cutoff

135 Sequential Selection – Backward
Input p-value Stay Cutoff

136 Sequential Selection – Backward
Input p-value Stay Cutoff

137 Sequential Selection – Backward
Input p-value Stay Cutoff

138 Sequential Selection – Backward
Input p-value Stay Cutoff

139 Sequential Selection – Backward
Input p-value Stay Cutoff

140 Sequential Selection – Backward
Input p-value Stay Cutoff

141 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

142 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

143 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

144 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

145 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

146 Sequential Selection – Stepwise
Input p-value Entry Cutoff Stay Cutoff

147 4.06 Poll Different model selection methods often result in different candidate models. No one method is uniformly the best.  Yes  No Yes

148 4.06 Poll – Correct Answer Different model selection methods often result in different candidate models. No one method is uniformly the best.  Yes  No Yes

149 Modeling Essentials – Regressions
Prediction formula Determine type of prediction. Variable clustering and selection Select useful inputs. Best model from sequence Optimize complexity.

150 Model Fit versus Complexity
Model fit statistic 2 6 Evaluate each sequence step. 3 5 validation 4 training 1

151 Select Model with Optimal Validation Fit
Model fit statistic Evaluate each sequence step. Choose simplest optimal model. 1 2 3 4 5 6

152 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

153 Interpretation Unit change in x2  logit(p) p x2 x2 x1 x1
2 change in logit 100(exp(2)-1)% change in the odds

154 Odds Ratio from a Logistic Regression Model
Estimated logistic regression model: logit(p) =  *(gender) Estimated odds ratio (Females to Males): odds ratio = (e )/(e-.7567) = 1.55 An odds ratio of 1.55 means that females have 1.55 times the odds of having the outcome compared to males.

155 Properties of the Odds Ratio
No Association Group in denominator has higher odds of the event. Group in numerator has higher odds of the event.

156 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

157 Extreme Distributions and Regressions
Original Input Scale skewed input distribution high leverage points

158 Extreme Distributions and Regressions
Original Input Scale true association true association skewed input distribution high leverage points

159 Extreme Distributions and Regressions
Original Input Scale true association standard regression standard regression true association skewed input distribution high leverage points

160 Extreme Distributions and Regressions
Original Input Scale Regularized Scale true association standard regression standard regression true association skewed input distribution high leverage points more symmetric distribution

161 Regularizing Input Transformations
Original Input Scale Original Input Scale Regularized Scale standard regression skewed input distribution high leverage points more symmetric distribution

162 Regularizing Input Transformations
Original Input Scale Regularized Scale true association standard regression regularized estimate standard regression regularized estimate true association

163 Idea Exchange What are examples of variables with unusual distributions that could produce problems in a regression model? Would you transform these variables? If so, what types of transformations would you entertain?

164 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

165 Nonnumeric Input Coding
Two-level variable: Coding redundancy: Level DA DB Level DA DB 1 0 A 1 0 A 0 1 B 0 1 B

166 Nonnumeric Input Coding: Many Levels
DA DB DC DD DE DF DG DH DI A B C D E F G H I 1

167 Coding Redundancy: Many Levels
DA DB DC DD DE DF DG DH DI DI 1 A B C D E F G H I 1

168 Coding Consolidation Level DA DB DC DD DE DF DG DH DI 1 0 0 0 0 0 0 0
A B C D E F G H I 1

169 Coding Consolidation Level DABCD DB DC DD DEF DF DGH DH DI
A B C D E F G H I 1

170 Beyond the Prediction Formula
Manage missing values. Interpret the model. Handle extreme or unusual values. Use nonnumeric inputs. Account for nonlinearities.

171 Standard Logistic Regression
0.40 0.50 0.60 0.70 ( p 1 – p ^ ) 1.0 log = w0 + w1 x1 + w2 x2 ^ ^ ^ ^ 0.9 0.8 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

172 Polynomial Logistic Regression
0.40 0.50 0.60 0.70 0.30 0.80 0.40 0.50 0.60 0.70 ( p 1 – p ^ ) 1.0 log = w0 + w1 x1 + w2 x2 ^ ^ ^ ^ 0.9 0.8 quadratic terms + w3 x1 + w4 x2 2 ^ + w5 x1 x2 0.7 0.6 x2 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 x1

173 Idea Exchange What are some predictors that you can think of that would have a nonlinear relationship with a target? What do you think the functional form of the relationship is (for example, quadratic, exponential, …)?

174 Catalog Case Study Analysis Goal: A mail-order catalog retailer wants to save money on mailing and increase revenue by targeting mailed catalogs to customers most likely to purchase in the future. Data set: CATALOG2010 Number of rows: 48,356 Number of columns: 98 Contents: sales figures summarized across departments and quarterly totals for 5.5 years of sales Targets: RESPOND (binary) ORDERSIZE (continuous)

175 Fitting a Logistic Regression Model
Catalog Case Study Task: Build a logistic regression model in SAS Enterprise Miner.

176 Catalog Case Study: Steps to Build a Logistic Regression Model
Add the CATALOG2010 data source to the diagram. Use the Data Partition node to split the data into training and validation data sets. Use the Variable Clustering node to select relatively independent inputs. Use the Regression node to select relevant inputs. Use the Model Comparison node to generate model assessment statistics and plots.  In the previous example, you performed steps 1 and 2.

177 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

178 Objectives Formulate an objective for predictive churn in a telecommunications example. Generate predictive models in SAS Enterprise Miner to predict churn. Score a customer database to target who is most likely to churn.

179 Telecommunications Company
Mobile (prepaid and postpaid) and fixed service provider. In recent years, a high percentage of high revenue subscribers have churned. Company wants to target subscribers with a high churn probability for its customer retention program.

180 Churn Score A churn propensity score measures the propensity for an active customer to churn. The score enables marketing managers to take proactive steps to retain targeted customers before churn occurs. Churn scores are derived from analysis of the historical behavior of churned customers and existing customers who have not churned.

181 Possible Predictor Variables
Outstanding bill value Outstanding balance period Number of calls Call duration (international, local, national calls) Period as customer Total dropped calls Total failed calls

182 Model Implementation inputs predictions Predictions might be added to a data source inside or outside of SAS Enterprise Miner.

183 Churn Case Study Examine the CHURN_TELECOM data set and add it to a diagram. Partition the data in training and validation data sets. Perform missing value imputation. Recode nominal variables to combine class levels. Reduce redundancy with variable clustering. Reduce irrelevant inputs with a decision tree and a logistic regression. Compare results and select the final model based on validation error. Score a data set to generate the list of churn risk customers.

184 Analyzing Churn Data Churn Case Study Task: Analyze churn data.

185 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

186 Objectives Discuss the movement of analytics from the “back office” to the executive level and the reasons for these changes. Describe the three-way pull for model management. Explain why models must be maintained and reassessed over time.

187 Model Management and Business Analytics
Model management is the assessment, deployment, and continued modification of models. This is a critical business process. Demonstrate that the model is well developed. Verify that the model is working well. Perform outcomes analysis. Model management requires a collaborative effort across the company: VP Decision Analysis and Support Group, Senior Modeling Analyst, Enterprise Architect, Internal Validation Compliance Analyst, Database Administrator Many companies now recognize the importance of implementing sound model management practices recognizing it is not a one time activity but truly an ongoing critical business process: Key elements include: Model is well developed – models should be logical on a priori basis. Models need to be developed using well defined and generally accepted techniques and methods. Models must also generalize well when applied to new data – be robust. Given a well developed model, the second element of this process is to confirm the model is implemented as designed. This entails checking the scoring (computer) code that deploys the model. Equally important, process verification includes checking the quality of the data (are the correct inputs available and properly defined); the model is properly documented; model predictions are compared with other sources of data and other challenger models. The third element is outcomes analysis in which where practical model predictions are compared to actual outcomes. Changing market and operating conditions dictate is an ongoing iterative process. Implementation of these model management key elements requires a collaborative effort. Creates and institutionalizes processes and procedures for model development and implementation. Directs model design, development and implementation consistent with both internal and external policies and procedures. Defines model performance reports and monitors model performance on an ongoing basis. Performs sample selection and design. Carries out statistical data discovery and transformations enrichment. Develops candidate predictive and descriptive models. Prepares model performance reports and monitors model performance on an ongoing basis. Recalibrates or refines models that do not meet model performance targets. Defines system and data architectures required to support business processes. Collects and prepares data sources for model development and deployment. Validates score code execution. Deploy models into operational scoring systems (call centers, credit origination systems, campaign management, etc.) Performs independent model validation review to ensure models perform correctly and comply with model specifications

188 Analytical Model Management Challenges
Largely Manual Processes Moving to Production Proliferation of Data and Models Actionable Inferences Must pay attention to government regulations Integrating with Operational Systems Increased Regulation Sarbanes-Oxley, Basel II

189 Three-Way Pull for Model Management
Business Value Governance Process Production Process [dh] Adrian earlier touched on the challenges of a growing model portfolio, manual model deployment, tighter integration with operational systems, and increased regulatory oversight. To overcome these challenges, there are three influences that pull for a model management process. First, it is critical to understand the business value of your analytical models. How can we be more effective in managing the portfolio of models? Secondly, you must have a specific process to move analytical models from development to the production environment. How can we ensure that models are deployed efficiently without impacting production systems? Finally, you must document the actions taken throughout the analytical model management process. Who is being held accountable for the success of the models? Are we providing appropriate documentation for regulatory compliance? [Next Slide]

190 Three-Way Pull for Model Management
Business Value Deployment of the “best” models Consistent model development and validation Understanding of model strategy and lifetime value Production Process Efficient deployment of models in a timely manner Effective deployment to minimize operational risk Governance Process Audit trails for compliance purposes Justification for management and shareholders [dh] At the core of analytical model development is to understand the business value the models bring to the organization. Models must be evaluated as part of the enterprise, and in context with other analytical models already in place or being evaluated as champions. Consistency of development extends across the entire model development cycle– from data staging through model generation through model consumption. That consistency provides a common method for measuring value, allowing apples-to-apples comparisons of results. And, over time, all analytical models will tend to lose their value – behaviour patterns will shift, environmental factors will change. So, it’s also important to understand the value of retiring models at the appropriate time, before they have a negative impact on the business. Once an analytical model has been generated, the next step is to deploy the model into the production environment. Models do not have an infinite lifetime. Speed to market is definitely a factor. Clearly defined processes lend themselves to efficiency. Handoffs between different departments or personnel should be clearly defined and understood. By having repeatable, controlled, and documented processes, models can be deployed in the most efficient and effective manner. Governance involves both external and internal requirements for accountability. This includes enforcing ownership of the dev/test/production process, supporting version control, and providing documentation for each step of the process. Governance also includes being accountable to corporate management and shareholders for the results that are ultimately generated. [Next Slide]

191 Changes in the Analytical Landscape
Now… STAKEHOLDERS Customer Service Retail Logistics Promotions OPERATIONS TARGET Management Customers Analytical Modelers IT Ops Data Integrators Business Governance Suppliers [dh] To deal with all these demands on model management, the number of people involved in the model management process expands outside the strict domain of the analytical developer. It now involves folks from across the organization – business users, IT operations, data architects, and others. So, instead of individual modelers managing their own process, an enterprise process is in place where the different steps in model deployment now need to be handled by multiple individuals in multiple departments. Collaboration between modelers and departments now becomes a key factor in successful implementations of analytical models into the production environment. [Next Slide] Employees Stockholders

192 Model Management As models proliferate, you need:
To be more diligent, but… There is not an established process to handle model deployment into production. Model deployment is inefficient. More individuals and groups in the organization must be involved in the process. To be more vigilant, but… It is difficult to effectively manage existing models and track the model life cycle. It is difficult to consistently provide appropriate internal and regulatory documentation. [dh] The goals for model management are pretty straightforward. Organizations would like to be more diligent about how models are deployed, and they want to be more vigilant in understanding the performance of their models. However, they are struggling with inefficient deployment mechanisms and have been ineffective in managing their model portfolio. Models do not have a infinite lifetime. They are built upon a snapshot of data. The longer it takes to deploy a model into production, the less value that model can generate. There must be a process in place to get models more quickly into production. All stakeholders involved with implementing the model need to be part of the process. Once deployed into the production environment, organizations need support to remain vigilant about ongoing performance. Often times, a model is deployed, then “forgotten”. The model continues to run over time, but it is based on old assumptions and outdated behavior patterns. There is no mechanism in place to warn about degrading performance. [Next Slide]

193 Idea Exchange How can you implement model management in your organization? Do you already have systems in place for continuous improvement and monitoring of models? For audit trails and compliance checks? Describe briefly how they operate.

194 Lessons Learned Model management is a key part of good business analytics. Models should be evaluated before, during, and after deployment. New models replace old ones as dictated by the data over time. Data mining comes in two forms. Directed data mining is searching through historical records to find patterns that explain a particular outcome. Directed data mining includes the tasks of classification, estimation, prediction, and profiling. Undirected data mining is searching through the same records for interesting patterns. It includes the tasks of cluster, finding association rules, and description. The primary lesson of this chapter is that data mining is full of traps for the unwary, and following a data mining methodology based on experience can help avoid them. The first hurdle is translating the business problem into one of the six tasks that can be solved by data mining: classification, estimation, prediction, affinity grouping, clustering, and profiling. The next challenge is to locate appropriate data that can be transformed into actionable information. Once the data has been located, it should be explored thoroughly. The exploration process is likely to reveal problems with the data. It will also help build up the data miner’s intuitive understanding of the data. The next step is to create a model set and partition it into training, validation, and test sets. Data transformations are necessary for two purposes: to fix problems with the data such as missing values and categorical variables that take on too many values, and to bring information to the surface by creating new variables to represent trends and other ratios and combinations. Once the data has been prepared, building models is a relatively easy process. Each type of model has its own metrics by which it can be assessed, but there are also assessment tools that are independent of the type of model. Some of the most important of these are the lift chart, which shows how the model has increased the concentration of the desired value of the target variable, and the confusion matrix, which shows the misclassification error rate for each of the target classes.

195 Chapter 4: Predictive Modeling
4.1 Introduction to Predictive Modeling 4.2 Predictive Modeling Using Decision Trees 4.3 Predictive Modeling Using Logistic Regression 4.4 Churn Case Study 4.5 A Note about Model Management 4.6 Recommended Reading

196 Recommended Reading Davenport, Thomas H., Jeanne G. Harris, and Robert Morison Analytics at Work: Smarter Decisions, Better Results. Boston: Harvard Business Press. Chapters 7 and 8 Chapters 7 and 8 focus on making analytics an integral part of a business. Systems, processes, and organizational culture must work together to move toward analytical leadership. The remaining three chapters of the book (9-11) are optional, self-study material.

197 Recommended Reading May, Thornton The New Know: Innovation Powered by Analytics. New York: Wiley. Chapter 1 May’s book provides a counterpoint to the Davenport, et al. book, from the perspective of the role of analysts in the organization, and how organizations can make the best use of their analytical talent.

198 Recommended Reading Morris, Michael. “Mining Student Data Could Save Lives.” The Chronicle of Higher Education. October 2, This article discusses the mining of student data at colleges and universities to prevent large-scale acts of violence on campus. Mining of students’ data (including Internet usage and social networking data), would enhance the capacity of threat-assessment teams to protect the health and safety of the students.


Download ppt "Chapter 4: Predictive Modeling"

Similar presentations


Ads by Google