Make every interaction count™ Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now USA: Austria Belgium: Canada: India Republic of Ireland: Netherlands Norway: Spain Sweden: UK: International: Access code #
Portrait Software Copyright 2007CUSTOMER CONFIDENTIAL How to ask a Question
Portrait Software Copyright 2007 Decision Trees: Profiling and Segmentation –Presenter: Sachin Chincholi, Professional Services –Audience: Existing Quadstone Users
Portrait Software Copyright 2007 Decision Trees for insight + Transparent –Easily understandable by non-statisticians –Sanity check your modelling framework –Is your objective defined correctly? –Are the initial splits plausible? + Fast to build –Quick alert to possible contamination
Portrait Software Copyright 2007 Decision Trees for Modeling + Transparent –Easier to get buy-in from the business –Easy to code + Non-parametric –No assumptions about underlying distributions of Analysis Candidates + Non-linear –Allow easy discovery of non-linear patterns (age vs. income) –‘Unstable’ –Different populations give very different trees
Portrait Software Copyright 2007 Interpreting a decision tree ≥ 40 The split at Age = 40 is the most predictive < 40 Age #2#3 50.2% of % of AgeIncome Color is used to show match rates #1 Objective: Response match = 26.2% of Match rate for the objective over the entire population
Portrait Software Copyright 2007 Decision tree build process –Given an objective, Decision Tree Builder will find the most predictive split among all possible splits, with all analysis candidates, given the current binnings –The population is then split into two segments based on this –The same method splits each of the two segments into two further segments –This process continues until the tree is finished, as determined by the tree constraints
Portrait Software Copyright 2007 Choice of a decision tree split –Each possible split is assigned a quality value –The splits are ranked: –The quality value depends on the tree type: –Binary outcome tree and classification tree: Information gain –Regression tree: R 2
Portrait Software Copyright Choice of a decision tree split (2) Objective: Response Level: 1 Age Income LoanAmount MaritalStatus SingleMarriedWidow Misc
Portrait Software Copyright 2007 Splitting criterion –Information = Σ p(c).log(p(c)) –Sum of (proportion C x log(proportion(C)) for all C’s –Equivalent to likelihood-ratio test for comparing two populations –Seeks to separate out classes, while minimising small nodes c=1,n
Portrait Software Copyright 2007 Is the decision tree any good (binary case)? Proportion of actual non- matches 1 Proportion of actual matches Gini “curve” 0 Sort by predicted propensity
Portrait Software Copyright 2007 Calculating the Gini value Gini = A/B x 100% Gini “curve” A B
Portrait Software Copyright 2007 Gini “curves” Perfect modelTotally unpredictive model
Portrait Software Copyright 2007 Overfitting Predictive power Complexity (relative to dataset size) apparent actual overfitting *
Portrait Software Copyright 2007 Best Practice –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning
Portrait Software Copyright 2007 Best Practice –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning
Portrait Software Copyright 2007 Best Practise –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning
Portrait Software Copyright 2007 Confidence interval for 100 responses… ,000100,000 Mean Upper Lower
Portrait Software Copyright 2007 Confidence intervals
Portrait Software Copyright 2007 What makes a good segment? If this is the average… Is this worth knowing? Is this?
Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practice
Portrait Software Copyright 2007 Possible splits scale exponentially
Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practice
Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practise
Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practise
Portrait Software Copyright 2007 Reporting on your model –Audit the model you build –Monitor future ‘through the door’ populations
Portrait Software Copyright 2007 Where to find out more –Quadstone System Support website: –Documentation –What’s new in the Quadstone System 5.3 release notes –Updated Quadstone System help (F1) –Updated Quadstone System data-build command and TML reference –Updated Data Build Manager reference –Updated Quadstone System administration reference –Customer-specific release notes –Quadstone System Support –Web Site: –Tel: US ; All
Portrait Software Copyright 2007Monday, February 22, 2016 Page 28 Portrait Software Copyright Asia Pacific Level Young Street Sydney NSW 2000 Australia F: Questions? EMEA (Headquarters) The Smith Centre, The Fairmile Henley-on-Thames, Oxfordshire, RG9 6AB, United Kingdom T: +44 (0) F: +44 (0) The Americas 125 Summer Street 16 th Floor Boston MA 02110, USA T: F: Asia Pacific Level Young Street Sydney NSW 2000 Australia F: