Download presentation
Presentation is loading. Please wait.
Published byShannon Green Modified over 9 years ago
1
Some working definitions…. ‘Data Mining’ and ‘Knowledge Discovery in Databases’ (KDD) are used interchangeably Data mining = –the discovery of interesting, meaningful and actionable patterns hidden in large amounts of data Multidisciplinary field originating from artificial intelligence, pattern recognition, statistics, machine learning, econometrics, ….
2
Data mining is a process… Business objectives Model Development –Model objective –Data collection & preparation –Model construction –Model evaluation –Combining models with business knowledge into decision logic Model / decision logic deployment Model / decision logic monitoring
3
Data mining is a process… a marketing example Business objectives –Cross sell MMS bundle to lapsed users / non users Model Development –Model objective For consumers with no MMS bundle in past 6 months, predict MMS bundle ownership yes/no in next three months –Data collection & preparation All fields for all active customers as of end APR05; remove all customers with MMS bundle in NOV04- APR05; Left join MMS Bundle field from MAY05, JUNE05, JULY05 –Model construction Build various models to predict MMS Bundle MAY or JUNE or JULY = ‘N’ on 70% if the data –Model evaluation Evaluate predictive power on 70% data for model development and 30% test set –Combining models with business knowledge into decision logic Target the top 30% and randomly test two propositions (50 MMS for 5Euro; 100MMS for 7.50Euro) across two channel (Direct mail and SMS) Model / decision logic deployment –Run the campaign Model / decision logic monitoring –Compare predctions against actual response to evaluate model quality and robustness –What propositions / channels work best
4
Data mining tasks Undirected, explorative, descriptive, ‘unsupervised’ data mining –Matching & search –Profile & rule extraction –Clustering & segmentation; dimension reduction Directed, predictive, ‘supervised’ data mining –Predictive modeling
5
Data mining task example: Clustering & segmentation
7
Start Looking Glass Source: Sentient Information Systems (www.sentient.nl)
8
Tussenresultaat looking glass Source: Sentient Information Systems (www.sentient.nl)
9
Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)
10
Resultaat Looking Glass Source: Sentient Information Systems (www.sentient.nl)
11
Case A7 Case B4 10 9 8 7 6 5 4 3 2 1 Worse business Score Better business Case A Case B Past experience Data Behaviour Good Bad Good Model Data mining task example: predictive modeling
12
Collected data
13
score = (0 x Income) + (-1 x Age) + (25 x Children) Data mining task example: predictive modeling
14
Data mining techniques for predictive modeling Linear and logistic regression Decision trees Neural Networks Nearest Neighbor Genetic Algorithms ….
15
score = (0 x Income) + (-1 x Age) + (25 x Children) Linear Regression Models
16
Regression in pattern space ageincome Only a single line available in pattern space to separate classes Class ‘circle’ Class ‘square’
17
Decision Trees 20000 customers response 1% Income >150000? 18800 customers Purchases >10? 1200 customers balance>50000? 800 customers response 1,8% etc. 400 customers response 0,1% no yes no
18
Decision Trees in Pattern Space ageincome Line pieces perpendicular to axes Each line is a split in the tree, two answers to a question
19
Decision Trees in Pattern Space ageweight Goal classifier is to seperate classes (circle, square) on the basis of attribute age and income Each line corresponds to a split in the tree Decision areas are ‘tiles’ in pattern space
20
Nearest Neighbour Data itself is the classification model, so no abstraction like a tree etc. For a given instance x, search the k instances that are most similar to x Classify x as the most occurring class for the k most similar instances
21
= new instance Any decision area possible Condition: enough data available Nearest Neighbor in Pattern Space Classification fe agefe weight
22
Nearest Neighbor in Pattern Space Voorspellen f.e. agebvb. weight Any decision area possible Condition: enough data available
23
Example classification algorithm 3: Neural Networks Inspired by neuronal computation in the brain (McCullough & Pitts 1943 (!)) Input (attributes) is coded as activation on the input layer neurons, activation feeds forward through network of weighted links between neurons and causes activations on the output neurons (for instance diabetic yes/no) Algorithm learns to find optimal weight using the training instances and a general learning rule.
24
Example simple network (2 layers) Probability of being diabetic = f (age * weight age + body mass index * weight body mass index) Neural Networks Weight body mass index Probability of being diabetic age body_mass_index weight age
25
Neural Networks in Pattern Space Classification f.e. agef.e. weight Simpel network: only a line available (why?) to seperate classes Multilayer network: Any classification boundary possible
26
Dilbert’s Perspective on Data Mining
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.