A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.

Slides:



Advertisements
Similar presentations
Wei Fan Ed Greengrass Joe McCloskey Philip S. Yu Kevin Drummey
Advertisements

Applications of one-class classification
Inductive Learning in Less Than One Sequential Data Scan Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Shaw-hwa Lo Columbia University.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Kun Zhang,
A Fully Distributed Framework for Cost-sensitive Data Mining Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Salvatore J. Stolfo.
A Framework for Scalable Cost- sensitive Learning Based on Combining Probabilities and Benefits Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson Salvatore.
Pruning and Dynamic Scheduling of Cost-sensitive Ensembles Wei Fan, Haixun Wang, and Philip S. Yu IBM T.J.Watson, Hawthorne, New York Fang Chu UCLA, Los.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao Wei Fan Jing JiangJiawei Han University of Illinois at Urbana-Champaign IBM T. J.
On the Optimality of Probability Estimation by Random Decision Trees Wei Fan IBM T.J.Watson.
Systematic Data Selection to Mine Concept Drifting Data Streams Wei Fan IBM T.J.Watson.
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
An Improved Categorization of Classifiers Sensitivity on Sample Selection Bias Wei Fan Ian Davidson Bianca Zadrozny Philip S. Yu.
Is Random Model Better? -On its accuracy and efficiency-
Decision Tree Evolution using Limited number of Labeled Data Items from Drifting Data Streams Wei Fan 1, Yi-an Huang 2, and Philip S. Yu 1 1 IBM T.J.Watson.
ReverseTesting: An Efficient Framework to Select Amongst Classifiers under Sample Selection Bias Wei Fan IBM T.J.Watson Ian Davidson SUNY Albany.
When Efficient Model Averaging Out-Perform Bagging and Boosting Ian Davidson, SUNY Albany Wei Fan, IBM T.J.Watson.
Classification.. continued. Prediction and Classification Last week we discussed the classification problem.. – Used the Naïve Bayes Method Today..we.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Forecasting Skewed Biased Stochastic Ozone Days: Analyses and Solutions Presentor:
Decision Tree Approach in Data Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
Data Stream Classification: Training with Limited Amount of Labeled Data Mohammad Mehedy Masud Latifur Khan Bhavani Thuraisingham University of Texas at.
Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1 University of Illinois, IBM TJ Watson Debapriya Basu.
Model Evaluation Metrics for Performance Evaluation
On Community Outliers and their Efficient Detection in Information Networks Jing Gao 1, Feng Liang 1, Wei Fan 2, Chi Wang 1, Yizhou Sun 1, Jiawei Han 1.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Ensemble Learning: An Introduction
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Data mining and statistical learning - lecture 13 Separating hyperplane.
Bagging LING 572 Fei Xia 1/24/06. Ensemble methods So far, we have covered several learning methods: FSA, HMM, DT, DL, TBL. Question: how to improve results?
Examples of Ensemble Methods
Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao† Wei Fan‡ Jing Jiang†Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Defect prediction using social network analysis on issue repositories Reporter: Dandan Wang Date: 04/18/2011.
Detecting Distance-Based Outliers in Streams of Data Fabrizio Angiulli and Fabio Fassetti DEIS, Universit `a della Calabria CIKM 07.
Thesis Proposal PrActive Learning: Practical Active Learning, Generalizing Active Learning for Real-World Deployments.
Predictive Modeling with Heterogeneous Sources Xiaoxiao Shi 1 Qi Liu 2 Wei Fan 3 Qiang Yang 4 Philip S. Yu 1 1 University of Illinois at Chicago 2 Tongji.
Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models Jing Gao 1, Feng Liang 2, Wei Fan 3, Yizhou Sun 1, Jiawei Han 1 1.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
F. Provost and T. Fawcett. Confusion Matrix 2Bitirgen - CS678.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Classification and Novel Class Detection in Data Streams Classification and Novel Class Detection in Data Streams Mehedy Masud 1, Latifur Khan 1, Jing.
Bab /57 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 2 Model Overfitting & Classifier Evaluation.
GENDER AND AGE RECOGNITION FOR VIDEO ANALYTICS SOLUTION PRESENTED BY: SUBHASH REDDY JOLAPURAM.
Classification and Prediction: Ensemble Methods Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Genetic Algorithms (in 1 Slide) l GA: based on an analogy to biological evolution l Each.
Xiangnan Kong,Philip S. Yu An Ensemble-based Approach to Fast Classification of Multi-label Data Streams Dept. of Computer Science University of Illinois.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
PEER TO PEER BOTNET DETECTION FOR CYBER- SECURITY (DEFENSIVE OPERATION): A DATA MINING APPROACH Masud, M. M. 1, Gao, J. 2, Khan, L. 1, Han, J. 2, Thuraisingham,
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
Lecture Notes for Chapter 4 Introduction to Data Mining
Data Mining Classification: Alternative Techniques
PEBL: Web Page Classification without Negative Examples
Introduction to Data Mining, 2nd Edition
Classification of class-imbalanced data
Knowledge Transfer via Multiple Model Local Structure Mapping
Using Clustering to Make Prediction Intervals For Neural Networks
COSC 4368 Intro Supervised Learning Organization
Presentation transcript:

A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois at Urbana-Champaign IBM T. J. Watson Research Center

Introduction (1) Data Stream –Continuously arriving data flow –Applications: network traffic, credit card transaction flow, phone calling records, etc

Introduction (2) Stream Classification –Construct a classification model based on past records –Use the model to predict labels for new data –Help decision making Fraud? Fraud Classification model Labeling

Framework ……… ? Classification Model Predict

Concept Drifts Changes in P(x,y) –P(x,y)=P(y|x)P(x) x-feature vector, y-class label –No Change, Feature Change, Conditional Change, Dual Change –Expected error is not a good indicator of concept drifts –Training on the most recent data could help reduce expected error Time Stamp 1 Time Stamp 11 Time Stamp 21

Issues in Stream Classification(1) Generative Model –P(y|x) follows some distribution Descriptive Model –Let data decides Stream Data –Distribution unknown and evolving

Issues in Stream Classification(2) Label Prediction –Classify x into one class Probability Estimation –x is assigned to all classes with different probabilities Stream Applications –Stochastic, prediction confidence information is needed

Mining Skewed Data Stream Skewed Distribution –Credit card frauds, network intrusions Existing Stream Classification Algorithms –Evaluated on balanced data Problems –Ignore minority examples –The cost of misclassifying minority examples is usually huge + - Classify every leaf node as negative

Stream Ensemble Approach (1) ……… ? Training set? Insufficient positive examples! Step 1 Sampling

Stream Ensemble Approach (2) Step 2 Ensemble C1C1 C2C2 CkCk …… 12k

Why this approach works? Incorporation of old positive examples –increase the training size, reduce variance –negative examples reflect current concepts, so the increase in boundary bias is small Ensemble –reduce variance caused by single model –disjoint sets of negative examplesthe classifiers will make uncorrelated errors Bagging & Boosting –running cost is much higher –cannot generate reliable probability estimates for skewed distributions

Analysis Error Reduction –Sampling –Ensemble Efficiency Analysis –Single model –Ensemble –Ensemble is more efficient

Experiments Measures –Mean Squared Error –ROC Curve –Recall-Precision Curve Baseline Methods –NS: No sampling +Single Model –SS: Sampling + Single Model –SE: Sampling + Ensemble

Experimental Results (1) Mean Squared Error on Synthetic Data Feature Change only P(x) changes Conditional Change only P(y|x) changes Dual Change both P(x) and P(y|x) changes

Experimental Results (2) Mean Squared Error on Real Data

Experimental Results (3) ROC CurveRecall-Precision Plot Plots on Synthetic Data

Experimental Results (4) ROC CurveRecall-Precision Plot Plots on Real Data

Experimental Results (5) Training Time

Conclusions General issues in stream classification –concept drifts –descriptive model –probability estimation Mining skewed data streams –sampling and ensemble techniques –accurate and efficient Wide applications –graph data –airforce data

Thanks! Any questions?