Make every interaction count™ Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting.

Slides:



Advertisements
Similar presentations
Software Analysis at Philips Healthcare MSc Project Matthijs Wessels 01/09/2009 – 01/05/2010.
Advertisements

Bab /44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
What’s New in Uplift Optimizer 5.3
Chapter 7 – Classification and Regression Trees
Chapter 7 – Classification and Regression Trees
Optimization in the World of Marketing Neil Skilling.
Tree-based methods, neutral networks
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 18-1 Chapter 18 Data Analysis Overview Statistics for Managers using Microsoft Excel.
Statistical Information and some methodological considerations.
Chapter 6 Decision Trees
Ensemble Learning (2), Tree and Forest
Decision Tree Models in Data Mining
Static VS Dynamic websites. 1-What are the advantages and disadvantages? 2- Which one should you choose and why?
Growth Firms Project Chris Parsley, Manager Small Business Policy Branch Industry Canada From Data to Research for Policy OECD Growth Firms Meeting.
Data Mining Techniques
INTRODUCTION TO MACHINE LEARNING David Kauchak CS 451 – Fall 2013.
Poverty & Human Capability 101 Introductory Class.
Information Security Management – Management System Requirements, Code of Practice for Controls, and Risk Management supervision Assistant Professor Dr.
Chapter 14: Nonparametric Statistics
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 24 Statistical Inference: Conclusion.
Overview of U.S. Results: Digital Problem Solving PIAAC results tell a story about the systemic nature of the skills deficit among U.S. adults.
Fall 2004 TDIDT Learning CS478 - Machine Learning.
PIAAC results tell a story about the systemic nature of the skills deficit among U.S. adults. Overview of U.S. Results: Focus on Numeracy.
#17 - Involve Users in the Development Model of Multinational Corporations - Is it worth it? Experience Report IRCSE '08: IDT Workshop Friday 31 October.
Copyright © 2012 Pearson Education. All rights reserved Copyright © 2012 Pearson Education. All rights reserved. Chapter 15 Inference for Counts:
Decision Trees.
Midterm Review Rao Vemuri 16 Oct Posing a Machine Learning Problem Experience Table – Each row is an instance – Each column is an attribute/feature.
1Copyright CEIS Preliminary Approach for the PSIP May2004 CEIS, WTC Tour B, 2 rue A Fresnel - METZ.
Innovation for Growth – i4g Universities are portfolios of (largely heterogeneous) disciplines. Further problems in university rankings Warsaw, 16 May.
Overview of U.S. Results: Focus on Literacy PIAAC results tell a story about the systemic nature of the skills deficit among U.S. adults.
Chapter 9 – Classification and Regression Trees
Identifying Feature Relevance Using a Random Forest Jeremy Rogers & Steve Gunn.
Computational Intelligence: Methods and Applications Lecture 19 Pruning of decision trees Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Heuristic evaluation Functionality: Visual Design: Efficiency:
Introduction, or what is data mining? Introduction, or what is data mining? Data warehouse and query tools Data warehouse and query tools Decision trees.
Support Vector Machines: a different approach to finding the decision boundary, particularly good at generalisation finishing off last lecture …
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Business Intelligence and Decision Modeling Week 11 Predictive Modeling (2) Logistic Regression.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 18 Inference for Counts.
Quality Software Project Management Software Size and Reuse Estimating.
Decision Trees. Decision trees Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is.
What’s New in the Quadstone System Version 5.1 Thursday, June 23, am Pacific, 12pm Eastern, 5pm UK/Ireland Friday, June 24, pm UK/Ireland,
Business Intelligence and Decision Modeling Week 9 Customer Profiling Decision Trees (Part 2) CHAID CRT.
United Nations Economic Commission for Europe Statistical Division The UNECE webpages on Time-Use Surveys Piera Tortora UNECE Work Session on Gender Statistics,
1 Data Mining dr Iwona Schab Decision Trees. 2 Method of classification Recursive procedure which (progressively) divides sets of n units into groups.
CONFIDENTIAL1 Hidden Decision Trees to Design Predictive Scores – Application to Fraud Detection Vincent Granville, Ph.D. AnalyticBridge October 27, 2009.
Limited Dependent Variables Ciaran S. Phibbs. Limited Dependent Variables 0-1, small number of options, small counts, etc. 0-1, small number of options,
The GRADE website and membership Yngve Falck-Ytter, M.D. Case Western Reserve University School of Medicine Barcelona, January 12&13, 2012 GRADE January.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Data Preparation in the Quadstone System Version 5 Thursday, 17 th February am PST / 10.30am EST / 3.30pm GMT / CET Please join the teleconference.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Classification and Regression Trees
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Figure 1. PARTICIPATING STEM CELL DONOR REGISTRIES Number of registries Year ©BMDW.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
BY International School of Engineering {We Are Applied Engineering} Disclaimer: Some of the Images and content have been taken from multiple online sources.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Collective Intelligence Week 7: Decision Trees
Introduction to Machine Learning and Tree Based Methods
Advanced Analytics Using Enterprise Miner
& Listing Distribution
From Heather’s blog:
MIS2502: Data Analytics Classification Using Decision Trees
Regression and Clinical prediction models
Regression and Clinical prediction models
2006 Rank Adjusted for Purchasing Power
Global Flight Data Monitoring Market
Presentation transcript:

Make every interaction count™ Decision Trees: Profiling and Segmentation Sachin Chincholi, Professional Services Starting in 15 minutesStarting in 10 minutesStarting in 5 minutesStarting in 2 minutesStarting now USA: Austria Belgium: Canada: India Republic of Ireland: Netherlands Norway: Spain Sweden: UK: International: Access code #

Portrait Software Copyright 2007CUSTOMER CONFIDENTIAL How to ask a Question

Portrait Software Copyright 2007 Decision Trees: Profiling and Segmentation –Presenter: Sachin Chincholi, Professional Services –Audience: Existing Quadstone Users

Portrait Software Copyright 2007 Decision Trees for insight + Transparent –Easily understandable by non-statisticians –Sanity check your modelling framework –Is your objective defined correctly? –Are the initial splits plausible? + Fast to build –Quick alert to possible contamination

Portrait Software Copyright 2007 Decision Trees for Modeling + Transparent –Easier to get buy-in from the business –Easy to code + Non-parametric –No assumptions about underlying distributions of Analysis Candidates + Non-linear –Allow easy discovery of non-linear patterns (age vs. income) –‘Unstable’ –Different populations give very different trees

Portrait Software Copyright 2007 Interpreting a decision tree ≥ 40 The split at Age = 40 is the most predictive < 40 Age #2#3 50.2% of % of AgeIncome Color is used to show match rates #1 Objective: Response match = 26.2% of Match rate for the objective over the entire population

Portrait Software Copyright 2007 Decision tree build process –Given an objective, Decision Tree Builder will find the most predictive split among all possible splits, with all analysis candidates, given the current binnings –The population is then split into two segments based on this –The same method splits each of the two segments into two further segments –This process continues until the tree is finished, as determined by the tree constraints

Portrait Software Copyright 2007 Choice of a decision tree split –Each possible split is assigned a quality value –The splits are ranked: –The quality value depends on the tree type: –Binary outcome tree and classification tree: Information gain –Regression tree: R 2

Portrait Software Copyright Choice of a decision tree split (2) Objective: Response Level: 1 Age Income LoanAmount MaritalStatus SingleMarriedWidow Misc

Portrait Software Copyright 2007 Splitting criterion –Information = Σ p(c).log(p(c)) –Sum of (proportion C x log(proportion(C)) for all C’s –Equivalent to likelihood-ratio test for comparing two populations –Seeks to separate out classes, while minimising small nodes c=1,n

Portrait Software Copyright 2007 Is the decision tree any good (binary case)? Proportion of actual non- matches 1 Proportion of actual matches Gini “curve” 0 Sort by predicted propensity

Portrait Software Copyright 2007 Calculating the Gini value Gini = A/B x 100% Gini “curve” A B

Portrait Software Copyright 2007 Gini “curves” Perfect modelTotally unpredictive model

Portrait Software Copyright 2007 Overfitting Predictive power Complexity (relative to dataset size) apparent actual overfitting *

Portrait Software Copyright 2007 Best Practice –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning

Portrait Software Copyright 2007 Best Practice –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning

Portrait Software Copyright 2007 Best Practise –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning

Portrait Software Copyright 2007 Confidence interval for 100 responses… ,000100,000 Mean Upper Lower

Portrait Software Copyright 2007 Confidence intervals

Portrait Software Copyright 2007 What makes a good segment? If this is the average… Is this worth knowing? Is this?

Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practice

Portrait Software Copyright 2007 Possible splits scale exponentially

Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practice

Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practise

Portrait Software Copyright 2007 –Derive a Training-Test field –Group “too small” categories –Reduce number of categories –Watch number of responses per node –(Watch confidence intervals of prediction) –Auto-pruning Best Practise

Portrait Software Copyright 2007 Reporting on your model –Audit the model you build –Monitor future ‘through the door’ populations

Portrait Software Copyright 2007 Where to find out more –Quadstone System Support website: –Documentation –What’s new in the Quadstone System 5.3 release notes –Updated Quadstone System help (F1) –Updated Quadstone System data-build command and TML reference –Updated Data Build Manager reference –Updated Quadstone System administration reference –Customer-specific release notes –Quadstone System Support –Web Site: –Tel: US ; All

Portrait Software Copyright 2007Monday, February 22, 2016 Page 28 Portrait Software Copyright Asia Pacific Level Young Street Sydney NSW 2000 Australia F: Questions? EMEA (Headquarters) The Smith Centre, The Fairmile Henley-on-Thames, Oxfordshire, RG9 6AB, United Kingdom T: +44 (0) F: +44 (0) The Americas 125 Summer Street 16 th Floor Boston MA 02110, USA T: F: Asia Pacific Level Young Street Sydney NSW 2000 Australia F: