Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk.

Slides:



Advertisements
Similar presentations
The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Advertisements

Data Mining Classification: Alternative Techniques
COMPUTER AIDED DIAGNOSIS: CLASSIFICATION Prof. Yasser Mostafa Kadah –
Naïve Bayes. Bayesian Reasoning Bayesian reasoning provides a probabilistic approach to inference. It is based on the assumption that the quantities of.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
1 Some Comments on Sebastiani et al Nature Genetics 37(4)2005.
CS 4100 Artificial Intelligence Prof. C. Hafner Class Notes March 27, 2012.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
LECTURE 11: BAYESIAN PARAMETER ESTIMATION
Naïve Bayes Classifier
On Discriminative vs. Generative classifiers: Naïve Bayes
1. Abstract 2 Introduction Related Work Conclusion References.
Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.
Assuming normally distributed data! Naïve Bayes Classifier.
Classification and risk prediction
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Ensemble Learning: An Introduction
Presented by Zeehasham Rasheed
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Decision Tree Models in Data Mining
Machine Learning CUNY Graduate Center Lecture 21: Graphical Models.
Basic Data Mining Techniques
An Exercise in Machine Learning
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Repository Method to suit different investment strategies Alma Lilia Garcia & Edward Tsang.
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley.
Bayesian Networks 4 th, December 2009 Presented by Kwak, Nam-ju The slides are based on, 2nd ed., written by Ian H. Witten & Eibe Frank. Images and Materials.
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
Introduction to variable selection I Qi Yu. 2 Problems due to poor variable selection: Input dimension is too large; the curse of dimensionality problem.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
PhD Committee J. Vanthienen (promotor, K.U.Leuven) J. Vandenbulcke
PRICING CONCEPTS FOR ESTABLISHING VALUE
Chapter 8 Discriminant Analysis. 8.1 Introduction  Classification is an important issue in multivariate analysis and data mining.  Classification: classifies.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Bayesian Networks Martin Bachler MLA - VO
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
1 Generative and Discriminative Models Jie Tang Department of Computer Science & Technology Tsinghua University 2012.
Learning Theory Reza Shadmehr Linear and quadratic decision boundaries Kernel estimates of density Missing data.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Naïve Bayes Classifier Ke Chen Modified and extended by Longin Jan Latecki
Jennifer Lewis Priestley Presentation of “Assessment of Evaluation Methods for Prediction and Classification of Consumer Risk in the Credit Industry” co-authored.
Ch 8. Graphical Models Pattern Recognition and Machine Learning, C. M. Bishop, Revised by M.-O. Heo Summarized by J.W. Nam Biointelligence Laboratory,
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Slides for “Data Mining” by I. H. Witten and E. Frank.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
An Exercise in Machine Learning
Classification Ensemble Methods 1
Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Digital Camera and Computer Vision Laboratory Department of Computer Science and Information Engineering National Taiwan University, Taipei, Taiwan, R.O.C.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Data Resource Management – MGMT An overview of where we are right now SQL Developer OLAP CUBE 1 Sales Cube Data Warehouse Denormalized Historical.
Bias Management in Time Changing Data Streams We assume data is generated randomly according to a stationary distribution. Data comes in the form of streams.
Data Mining Lecture 11.
Introduction to Data Mining, 2nd Edition by
Introduction to Data Mining, 2nd Edition by
Computer Vision Chapter 4
Presentation by: Oksana Myachina, Jeff Janies
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Propagation Algorithm in Bayesian Networks
Discriminative Frequent Pattern Analysis for Effective Classification
Computer Vision Chapter 4
Basics of ML Rohan Suri.
Machine Learning: UNIT-3 CHAPTER-1
Roc curves By Vittoria Cozza, matr
Presentation transcript:

Bayesian Network Classifiers for Identifying the Slope of the customer Lifecycle of Long-Life Customers Authored by: Bart Baesens, Geert Vertraeten, Dirk Poel, Michael Petersen, Patrick Kenhove, Jan Vanthienen Presentation by: Oksana Myachina, Jeff Janies

INTRODUCTION Acquiring a new customer is more costly, than selling additional products to existing ones. Acquiring a new customer is more costly, than selling additional products to existing ones. Traditional brand strategies should be replaced by customer strategies. Traditional brand strategies should be replaced by customer strategies. It’s very important to make informed decisions on customers level. It’s very important to make informed decisions on customers level.

CRM is successful only if customers remain at least to a certain extent,loyal to the company in case. CRM is successful only if customers remain at least to a certain extent,loyal to the company in case. Research shows large heterogeneity in long-term customers spending. Research shows large heterogeneity in long-term customers spending. Responding to this fact, the study explained in the paper,was performed. Responding to this fact, the study explained in the paper,was performed.

The relevance of estimation of a customer’s spending evaluation Traditional relationship marketing claims: Traditional relationship marketing claims: -loyal customers raise their spending -loyal customers raise their spending -generate new customers -generate new customers -ensure diminishing serving costs -ensure diminishing serving costs - have reduced consumer price sensitivity - have reduced consumer price sensitivity RM main idea : the longer customer stays loyal to company, the more Profit it has RM main idea : the longer customer stays loyal to company, the more Profit it has

Reinartz and Kumar state that LLC are not necessary: - cheaper to serve - less price sensitive - more effective in bringing new business to the company   Mail Company example

What is the aim of the study? To elaborate an accurate indication of customer’s future spending evaluation To elaborate an accurate indication of customer’s future spending evaluation To account for heterogeneity within the group of long-life customer To account for heterogeneity within the group of long-life customer To estimate whether newly acquired customers will increase or decrease their future spending To estimate whether newly acquired customers will increase or decrease their future spending

  Binary classification problem: 'Will newly acquired customers increase or decrease their spending after their first purchase experiences?‘   Previous experience : - - traditional statistical methods - - nonparametric statistical models - - neural networks   Innovation - adaptation of Bayesian network classifiers Aim and Methodology

Naïve Bayes classifiers Often work well in practice Often work well in practice Learns the class-conditional probabilities P( Xi = xi | C = cl) Learns the class-conditional probabilities P( Xi = xi | C = cl) New test cases are classified by using Bayes’ rule to compute the posterior probability of each class cl given the vector of observed variable values (see handout) New test cases are classified by using Bayes’ rule to compute the posterior probability of each class cl given the vector of observed variable values (see handout)

Naïve Bayes Classifier

TANs Tree Augmented Naïve Bayes Classifiers (TANs) Tree Augmented Naïve Bayes Classifiers (TANs) Extension of the Naïve Bayes Classifiers Extension of the Naïve Bayes Classifiers Relax the independence assumption by allowing arcs between the variables Relax the independence assumption by allowing arcs between the variables The class variable has no parents and each variable has as parents the class variable and at most one other variable The class variable has no parents and each variable has as parents the class variable and at most one other variable The variables are only allowed to form a tree structure The variables are only allowed to form a tree structure

Tree Augmented Naïve Bayes classifier

GBN: Learning Algorithm Assumes an a priori ordering of the variables Assumes an a priori ordering of the variables D-separation plays a pivotal role in the structure learning algorithm D-separation plays a pivotal role in the structure learning algorithm A four phase algorithm A four phase algorithm  Create a draft  Add and remove arcs based on the concept of d-separation and conditional independence  Establish parameters

Multinet Bayesian Network Classifiers GBN and TANs assume relations between the variables are the same for all classes GBN and TANs assume relations between the variables are the same for all classes Multinet Bayesian networks allows for more flexibility and is composed of a separate, local network for each class and prior probability distribution of the class node Multinet Bayesian networks allows for more flexibility and is composed of a separate, local network for each class and prior probability distribution of the class node (see handout for formulas) (see handout for formulas)

Other Methods used, but not discussed CL multinet CL multinet C4.5 and C4.5rules C4.5 and C4.5rules  White-box classifiers for classification decisions Linear Discriminant Analysis (LDA) Linear Discriminant Analysis (LDA)  Well-known benchmark statistical classifiers Quadratic Discriminant Analysis (QDA) Quadratic Discriminant Analysis (QDA)  Well-known benchmark statistical classifiers

Training Naïve Bayes and TAN used Matlab toolbox of Kevin Murphy Naïve Bayes and TAN used Matlab toolbox of Kevin Murphy GBN and GBN multinet classifiers used PowerPredictor software GBN and GBN multinet classifiers used PowerPredictor software

Data Set Variables of the Study Variables of the Study Time Frame Time Frame Attributes, Values, and Encodings Attributes, Values, and Encodings

Performance Classification Measured by area under the Receiver operating characteristic curve (AUROC) Measured by area under the Receiver operating characteristic curve (AUROC)  Uses a 2D graph of the sensitivity on the Y-axis (true alarms) versus the false alarms on the X -axis

Performance Classification Percentage of correctly classified (PCC) Percentage of correctly classified (PCC)  This is the most commonly used measure of performance of a classifier  Contingency table analysis to detect statistically significant performance differences between classifiers.

Accuracy and Analysis AUROC ROC

The results Naïve Bayes and TAN did not remove any attributes Naïve Bayes and TAN did not remove any attributes TAN added 14 arcs to the Naïve Bayes classifier with minimal performance improvement TAN added 14 arcs to the Naïve Bayes classifier with minimal performance improvement GBN multinet looks simpler, but bad performance GBN multinet looks simpler, but bad performance GBN classifier was able to prune 12 attributes GBN classifier was able to prune 12 attributes

The Unrestricted

Practical implementation Marketing investment decision Marketing investment decision Monitor of customer-acquisition policies Monitor of customer-acquisition policies To design an a-priori segmentation scheme for a company's customer base To design an a-priori segmentation scheme for a company's customer base

THANK YOU!