Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Kun Zhang,
A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo.
A General Framework for Fast and Accurate Regression by Data Summarization in Random Decision Trees Wei Fan, IBM T.J.Watson Joe McCloskey, US Department.
A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore.
Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los.
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Actively Transfer Domain Knowledge Xiaoxiao Shi Wei Fan Jiangtao Ren Sun Yat-sen University IBM T. J. Watson Research Center Transfer when you can, otherwise.
Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
An Improved Categorization of Classifiers Sensitivity on Sample Selection Bias Wei Fan Ian Davidson Bianca Zadrozny Philip S. Yu.
Is Random Model Better? -On its accuracy and efficiency-
Decision Tree Evolution using Limited number of Labeled Data Items from Drifting Data Streams Wei Fan 1, Yi-an Huang 2, and Philip S. Yu 1 1 IBM T.J.Watson.
ReverseTesting: An Efficient Framework to Select Amongst Classifiers under Sample Selection Bias Wei Fan IBM T.J.Watson Ian Davidson SUNY Albany.
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Multi-label Classification without Multi-label Cost - Multi-label Random Decision Tree Classifier 1.IBM Research – China 2.IBM T.J.Watson Research Center.
Imbalanced data David Kauchak CS 451 – Fall 2013.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 15 Nov, 1, 2011 Slide credit: C. Conati, S.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
Decision Tree under MapReduce Week 14 Part II. Decision Tree.
Bayesian Learning Rong Jin. Outline MAP learning vs. ML learning Minimum description length principle Bayes optimal classifier Bagging.
Chapter 7 – K-Nearest-Neighbor
Decision Tree Rong Jin. Determine Milage Per Gallon.
Sparse vs. Ensemble Approaches to Supervised Learning
Marakas: Decision Support Systems, 2nd Edition © 2003, Prentice-Hall Chapter Chapter 4: Modeling Decision Processes Decision Support Systems in the.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
1 On Constructing Efficient Shared Decision Trees for Multiple Packet Filters Author: Bo Zhang T. S. Eugene Ng Publisher: IEEE INFOCOM 2010 Presenter:
Ensemble Learning: An Introduction
Ensemble Learning (2), Tree and Forest
Evaluating Performance for Data Mining Techniques
Classifiers, Part 3 Week 1, Video 5 Classification  There is something you want to predict (“the label”)  The thing you want to predict is categorical.
Dependency networks Sushmita Roy BMI/CS 576 Nov 26 th, 2013.
by B. Zadrozny and C. Elkan
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Scaling up Decision Trees. Decision tree learning.
Data Mining Algorithms for Large-Scale Distributed Systems Presenter: Ran Wolff Joint work with Assaf Schuster 2003.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Estimating Component Availability by Dempster-Shafer Belief Networks Estimating Component Availability by Dempster-Shafer Belief Networks Lan Guo Lane.
What is Data Mining? process of finding correlations or patterns among dozens of fields in large relational databases process of finding correlations or.
Human pose recognition from depth image MS Research Cambridge.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Predicting Good Probabilities With Supervised Learning
Chapter 6 – Three Simple Classification Methods © Galit Shmueli and Peter Bruce 2008 Data Mining for Business Intelligence Shmueli, Patel & Bruce.
Socialbots and its implication On ONLINE SOCIAL Networks Md Abdul Alim, Xiang Li and Tianyi Pan Group 18.
DECISION TREE Ge Song. Introduction ■ Decision Tree: is a supervised learning algorithm used for classification or regression. ■ Decision Tree Graph:
Konstantina Christakopoulou Liang Zeng Group G21
COMP24111: Machine Learning Ensemble Models Gavin Brown
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
CIS 335 CIS 335 Data Mining Classification Part I.
ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
CS 548 Spring 2016 Model and Regression Trees Showcase by Yanran Ma, Thanaporn Patikorn, Boya Zhou Showcasing work by Gabriele Fanelli, Juergen Gall, and.
Machine Learning in Practice Lecture 8 Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
10. Decision Trees and Markov Chains for Gene Finding.
Oracle Advanced Analytics
Machine Learning in Practice Lecture 18
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
COMP61011 : Machine Learning Ensemble Models
Data Mining (and machine learning)
MIS2502: Data Analytics Classification using Decision Trees
Statistical Learning Dong Liu Dept. EEIS, USTC.
Machine Learning in Practice Lecture 17
IDSL, Intelligent Database System Lab
CART on TOC CART for TOC R 2 = 0.83
CS 416 Artificial Intelligence
Predicting Loan Defaults
… 1 2 n A B V W C X 1 2 … n A … V … W … C … A X feature 1 feature 2
Credit Card Fraudulent Transaction Detection
Presentation transcript:

Making Data Mining Models Useful to Model Non-paying Customers of Exchange Carriers Wei Fan, IBM T.J.Watson Janek Mathuria, and Chang-tien Lu Virginia Tech, Northern Virginia Center

Our Selling Points A real practical problem for an actual CLEC company. A whole process: Start with a great goal. Reality taught us a lesson Settle down with a realistic solution A new set of algorithms to calibrate probability outputs (as distinguished from Zadrozny and Elkans calibration methods)

Challenging Problem Differentiate between Late and Default: Late: 1 month past due Default: two month past due. Default Percentage: 20%. Designed feature set: Details in Paper Calling summary. Billing summary. Obvious ones. Other ones out there? Maybe.

Failure Failure of Commonly Used Methods: Nearly predicting every customer as paying on time and still has 80% What this means: Our feature set not complete? Probably. Problem itself is just stochastic in nature. Natural next step: cost-sensitive learning? Impossible to define precisely due to complexity.

A Compromised Solution Predict a reliable probability score. A customer is uniquely distinguished by its feature vector. If the model predict that a customer has 20% chance to default Indeed the customer has 20% chance to default The predicted score is considered reliable

Previously Proposed Calibration Methods Existing approaches that output scores are not reliable (Zadrozny and Elkan) Decision trees. Naïve Bayes SVM Logistic Regression Use function mapping to calibrate unreliable score to reliable ones. Assumption: original unreliable score need to be monotonous. Otherwise, it is not applicable.

A Good Calibration

A Bad Calibration

Random Decision Trees Amazingly Simple and Counter-intuitive: Do not use any purity check function. Pick a feature randomly. Continuous feature, pick a random splitting point. Discrete feature can be picked only once in one decision path. Continuous feature can be picked multiple times. Tree depth up to the number of features. Original feature set. No bootstrap! Each tree computes probability at the leaf node. 10 fraud and 90 normal transaction, p(fraud|x) = 0.1 Multiple trees, 10 min and 30 enough, average probability.

Random Forest + Marriage between Random Decision Tree and Random Forest Pick a feature subset randomly. Compute info gain for each feature. Choose the one with highest info gain. Original dataset. Not bootstrap. Leaf node computes probability. 10 to 30 trees.

Availability. Software available upon request.