Data Stream Mining and Incremental Discretization John Russo CS561 Final Project April 26, 2007.

Slides:



Advertisements
Similar presentations
Decision Tree Approach in Data Mining
Advertisements

Mining High-Speed Data Streams Presented by: Tyler J. Sawyer UVM Spring CS 332 Data Mining Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International.
Rutgers CS440, Fall 2003 Review session. Rutgers CS440, Fall 2003 Topics Final will cover the following topics (after midterm): 1.Uncertainty & introduction.
Classification: Definition Given a collection of records (training set ) –Each record contains a set of attributes, one of the attributes is the class.
1 Data Mining Classification Techniques: Decision Trees (BUSINESS INTELLIGENCE) Slides prepared by Elizabeth Anglo, DISCS ADMU.
Third International Workshop on Knowledge Discovery from Data Streams, 2006 Classification of Changes in Evolving Data Streams using Online Clustering.
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Supervised Learning Recap
Lecture Notes for Chapter 4 Introduction to Data Mining
Rosa Cowan April 29, 2008 Predictive Modeling & The Bayes Classifier.
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
1 Mining Decision Trees from Data Streams Tong Suk Man Ivy CSIS DB Seminar February 12, 2003.
Data Mining with Decision Trees Lutz Hamel Dept. of Computer Science and Statistics University of Rhode Island.
Lecture 5 (Classification with Decision Trees)
Basic concepts of Data Mining, Clustering and Genetic Algorithms Tsai-Yang Jea Department of Computer Science and Engineering SUNY at Buffalo.
August 12, 2003 IV. FUZZY SET METHODS - CLUSTER ANALYSIS: Math Clinic Fall IV. FUZZY SET METHODS for CLUSTER ANALYSIS and (super brief) NEURAL NETWORKS.
Classification and Prediction by Yen-Hsien Lee Department of Information Management College of Management National Sun Yat-Sen University March 4, 2003.
Applications of Data Mining in Microarray Data Analysis Yen-Jen Oyang Dept. of Computer Science and Information Engineering.
Bayesian Classification with a brief introduction to pattern recognition Modified from slides by Michael L. Raymer, Ph.D.
Data Mining – Intro.
CS157A Spring 05 Data Mining Professor Sin-Min Lee.
Data Mining Strategies. Scales of Measurement  Stevens, S.S. (1946). On the theory of scales of measurement. Science, 103,  Four Scales  Categorical.
Jeff Howbert Introduction to Machine Learning Winter Classification Bayesian Classifiers.
Enterprise systems infrastructure and architecture DT211 4
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Data Mining Techniques
Performance Improvement for Bayesian Classification on Spatial Data with P-Trees Amal S. Perera Masum H. Serazi William Perrizo Dept. of Computer Science.
DATA MINING : CLASSIFICATION. Classification : Definition  Classification is a supervised learning.  Uses training sets which has correct answers (class.
Data mining and machine learning A brief introduction.
Bayesian Networks. Male brain wiring Female brain wiring.
INTRODUCTION TO MACHINE LEARNING. $1,000,000 Machine Learning  Learn models from data  Three main types of learning :  Supervised learning  Unsupervised.
Inductive learning Simplest form: learn a function from examples
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
1 Data Mining Lecture 3: Decision Trees. 2 Classification: Definition l Given a collection of records (training set ) –Each record contains a set of attributes,
Introduction to Data Mining Group Members: Karim C. El-Khazen Pascal Suria Lin Gui Philsou Lee Xiaoting Niu.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
Data Mining: Classification & Predication Hosam Al-Samarraie, PhD. Centre for Instructional Technology & Multimedia Universiti Sains Malaysia.
Data Mining with Oracle using Classification and Clustering Algorithms Proposed and Presented by Nhamo Mdzingwa Supervisor: John Ebden.
1 Knowledge Discovery Transparencies prepared by Ho Tu Bao [JAIST] ITCS 6162.
Copyright © 2004 Pearson Education, Inc.. Chapter 27 Data Mining Concepts.
Bayesian Classification. Bayesian Classification: Why? A statistical classifier: performs probabilistic prediction, i.e., predicts class membership probabilities.
Classification Techniques: Bayesian Classification
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.
CS157B Fall 04 Introduction to Data Mining Chapter 22.3 Professor Lee Yu, Jianji (Joseph)
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
An Investigation of Commercial Data Mining Presented by Emily Davis Supervisor: John Ebden.
DATA MINING WITH CLUSTERING AND CLASSIFICATION Spring 2007, SJSU Benjamin Lam.
An Introduction Student Name: Riaz Ahmad Program: MSIT( ) Subject: Data warehouse & Data Mining.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Chapter 6. Classification and Prediction Classification by decision tree induction Bayesian classification Rule-based classification Classification by.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
1. ABSTRACT Information access through Internet provides intruders various ways of attacking a computer system. Establishment of a safe and strong network.
DATA MINING TECHNIQUES (DECISION TREES ) Presented by: Shweta Ghate MIT College OF Engineering.
Mining High-Speed Data Streams Presented by: William Kniffin Pedro Domingos Geoff Hulten Sixth ACM SIGKDD International Conference
Data Mining: Concepts and Techniques Course Outline
Classification Techniques: Bayesian Classification
Prepared by: Mahmoud Rafeek Al-Farra
Data Mining: Concepts and Techniques (3rd ed.) — Chapter 8 —
©Jiawei Han and Micheline Kamber
Christoph F. Eick: A Gentle Introduction to Machine Learning
Learning from Data Streams
A task of induction to find patterns
A task of induction to find patterns
Presentation transcript:

Data Stream Mining and Incremental Discretization John Russo CS561 Final Project April 26, 2007

Overview  Introduction  Data Mining: A Brief Overview  Histograms  Challenges of Streaming Data to Data Mining  Using Histograms for Incremental Discretization of Data Streams  Fuzzy Histograms  Future Work

Introduction  Data mining  Class of algorithms for knowledge discovery  Patterns, trends, predictions  Utilizes statistical methods, neural networks, genetic algorithms, decision trees, etc.  Streaming data presents unique challenges to traditional data mining  Non-persistence – one opportunity to mine  Data rates  Non-discrete  Changing over time  Huge volumes of data

Data Mining Types of Relationships  Classes  Predetermined groups  Clusters  Groups of related data  Sequential Patterns  Used to predict behavior  Associations  Rules are built from associations between data

Data Mining Algorithms  K-means clustering  Unsupervised learning algorithm  Classified data set into pre-defined clusters  Decision Trees  Used to generate rules for classification  Two common types:  CART  CHAID  Nearest Neighbor  Classify a record in a dataset based upon similar records in a historical dataset

Data Mining Algorithms (continued)  Rule Induction  Uses statistical significance to find interesting rules  Data Visualization  Uses graphics for mining

Histograms and Data Mining

Histograms and Supervised Learning – An Example

 We have two classes:  Mortgage approval = “yes”  P(mortgage approval = "Yes") = 5/10 =.5  Mortgage approval = “no”  P(mortgage approval = "Yes") = 5/10 =.5  Let’s calculate some of the conditional probabilities based upon training data:  P(age<=30|mortgage approval = "Yes") = 2/5 =.4  P(age<=30|mortgage approval = "No") = 2/5 =.4  P(income="Low"| mortgage approval = "Yes") = 2/5 =.4  P(income="Low"| mortgage approval = "No") = 2/5 =.4  P(income = "Medium"|mortgage approval = "Yes") = 1/5 =.2  P(income = "Medium"|mortgage approval = "No") = 1/5 =.2  P(marital status = "Married"| mortgage approval = "Yes") = 3/5 = 0.6  P(marital status = "Married"| mortgage approval = "No") = 3/5 = 0.6  P(credit rating = "Good"|mortgage approval = "Yes") = 1/5 =.2  P(credit rating = "Good"|mortgage approval = "No") = 2/5 =.5

Histograms and Supervised Learning – An Example  We will use Bayes’ rule and the naïve assumption that all attributes are independent:  P(A 1 =a 1 ...  A k =a k ) is irrelevant, since it is the same for every class  Now, let’s predict the class for one observation:  X=Age<=30, income="medium", marital status = "married", credit rating = "good"

Histograms and Supervised Learning – An Example  P(X|mortgage approval = "Yes") =.4 *.2 *.6 *.2 =  P(X|mortgage approval = "No") =.4 *.2 *.6 *.5 =  P(X|C=c)*P(C=c) : *.4 =  *.4 =  X belongs to “no” class.  The probabilities are determined by frequency counts, the frequencies are tabulated in bins.  Two common types of histograms  Equal-width – the range of observed values is divided into k intervals  Equal-frequency – the frequencies are equal in all bins  Difficulty is determining number of bins or k  Sturges’ rule  Scott’s rule  Determining k for a data stream is problematic

Challenges of Data Streaming to Data Mining  Determining k for a histogram or machine learning  Concept drifting  Data from the past is no longer valid for the model today  Several approaches  Incremental learning – CVFDT  Ensemble classifiers  Ambiguous decision trees  What about “ebb and flow” problem?

Incremental Discretization  Way to create discrete intervals from a data stream  Partition Incremental Discretization (PID) algorithm (Gama and Pinto)  Two-level algorithm  Creates intervals at level 1  Only one pass over the stream  Aggregates level 1 intervals into level 2 intervals

Incremental Discretization Example

 Sensor data reporting on air temperature, soil moisture and flow of water in a sprinkler.  The data shown in the previous slide is training data  Once trained, model can predict what we should set sprinkler to based upon conditions  4 class problem

Incremental Discretization Example  We will walk through level 1 for the temperature attribute.  Decide an estimated range -> 30 – 85  Pick number of intervals (11)  Step is set to 5  2 vectors: breaks and counts  Set a threshold for splitting an interval -> 33% of all observed values  Begin to work through training set  If a value falls below the lower bound of the range, add a new interval before the first interval  If a value falls above the upper bound of the range, add a new interval after the last value  If an interval reaches the threshold, split it evenly and divide the count between the old interval and the new

Incremental Discretization Example  Breaks vector for our sample after training  Counts vector for our sample after training

Second Layer  The second layer is invoked whenever necessary.  User intervention  Changes in intervals of first layer  Input  Breaks and counters from layer 1  Type of histogram to be generated

Second Layer  Objective is to create a smaller number of intervals based upon layer 1intervals  For equal width histograms:  Computes number of intervals based upon observed range in layer 1  Traverses the vector of breaks once and adds counters of consecutive intervals  Equal frequency  Computes exact number of data points in each interval  Traverses counter and adds counts for consecutive interval  Stops for each layer 2 interval when frequency is reached

Application of PID for Data Mining  Add a data structure to both layer 1 and layer 2.  Matrix:  Columns: intervals  Rows: classes  Naïve Bayesian classification can be easily done

Example Matrix Temperature Attribute Class High Med Low Off

Dealing with Concept Drift  What happens when training is no longer valid (for example, winter?)  Assume sensors are still on in winter but sprinklers are not

Dealing with Concept Drift Fuzzy Histograms  Fuzzy histograms are used for visual content representation.  A given attribute can be a member of more than 1 interval.  Varying degrees of membership  Degree of membership is determined by a membership function

Fuzzy Histograms with PID  Use membership function to build layer 2 intervals based upon a determinant in layer 1  Sprinkler example  Soil moisture is potentially a member of >1 interval  One interval is a high value  During winter, ensure that all values of moisture fall into highest end of range

References  [1] Hand, David. Mannila, Heikki and Padhraic Smyth. Principles of Data Mining. Cambridge, MA: MIT Press,  [2] Sturges, H.(1926) The choice of a class-interval. J. Amer. Statist. Assoc., 21, 65–66.  [3] D.W. Scott. On optimal and data-based histograms, Biometrika 66(1979)  [4] David Freedman and Persi Diaconis (1981). "On the histogram as a density estimator: Persi DiaconisPersi Diaconis  L2 theory." Probability Theory and Related Fields. 57(4):  [5] Jianping Zhang, Huan Liu and Paul P. Wang, Some current issues of streaming data mining, Information Sciences, Volume 176, Issue 14, Streaming Data Mining, 22 July 2006, Pages  [6] Hulten, G., Spencer, L., and Domingos, P Mining time-changing data streams. In Proceedings of the Seventh ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (San Francisco, California, August , 2001). KDD '01. ACM Press, New York, NY,  [7] Wang, H., Fan, W., Yu, P. S., and Han, J Mining concept-drifting data streams using ensemble classifiers. In Proceedings of the Ninth ACM SIGKDD international Conference on Knowledge Discovery and Data Mining (Washington, D.C., August , 2003). KDD '03. ACM Press, New York, NY,  [8] Natwichai, J. and Li, X. (2004). Knowledge Maintenance on Data Streams with Concept Drifting. In: Zhang, J., He, J. and Fu, Y. 2004, ( ), Shanghai, China.  [9] Gama, J. and Pinto, C Discretization from data streams: applications to histograms and data mining. In Proceedings of the 2006 ACM Symposium on Applied Computing (Dijon, France, April , 2006). SAC '06. ACM Press, New York, NY,  [10] Anastasios Doulamis and Nikolaos Doulamis.Fuzzy histograms for Efficient Visual Content Representation:Application to content-based image retrieval. In IEEE International Conference on Multimedia and Expo(ICME’01),page227.IEEE Press,2001.  [11] Gaber, M.M., Zaslavsky, A. & Krishnaswamy, S. 2005, "Mining data streams: a review", SIGMOD Rec., vol. 34, no. 2, pp

Questions ?