Third International Workshop on Knowledge Discovery from Data Streams, 2006 Classification of Changes in Evolving Data Streams using Online Clustering.

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

Prof. Carolina Ruiz Department of Computer Science Worcester Polytechnic Institute INTRODUCTION TO KNOWLEDGE DISCOVERY IN DATABASES AND DATA MINING.
Computer Security Lab Concordia Institute for Information Systems Engineering Concordia University Montreal, Canada A Novel Approach of Mining Write-Prints.
1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD With slides from a KDD04 tutorial by Haixun Wang, Jian Pei & Philip.
9/15/2008 CTBTO Data Mining/Data Fusion Workshop 1 Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist.
Data Stream Classification: Training with Limited Amount of Labeled Data Mohammad Mehedy Masud Latifur Khan Bhavani Thuraisingham University of Texas at.
Streaming Pattern Discovery in Multiple Time-Series Spiros Papadimitriou Jimeng Sun Christos Faloutsos Carnegie Mellon University VLDB 2005, Trondheim,
A KLT-Based Approach for Occlusion Handling in Human Tracking Chenyuan Zhang, Jiu Xu, Axel Beaugendre and Satoshi Goto 2012 Picture Coding Symposium.
Date : 21 st of May, Shri Ramdeo Baba College of Engineering and Management Presentation By : Rimjhim Singh Under the Guidance of: Dr. M.B. Chandak.
Ranking Electrical Feeders of the New York City Power Grid Phil Gross Ansaf Salleb-Abouissi Haimonti Dutta Albert Boulanger Problem Primary electricity.
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Introduction to Automatic Classification Shih-Wen (George) Ke 7 th Dec 2005.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Visualisation of Cluster Dynamics and Change Detection in Ubiquitous Data Stream Mining Authors Brett Gillick, Mohamed Medhat Gaber, Shonali Krishnaswamy,
Example Data Sets Prior Research Join related objects to form independent compound objects, cluster normally (Yin et al., 2005). Use attribute-based distance.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Attention and Event Detection Identifying, attributing and describing spatial bursts Early online identification of attention items in social media Louis.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
Semi-Supervised Learning with Concept Drift using Particle Dynamics applied to Network Intrusion Detection Data Fabricio Breve Institute of Geosciences.
ENN: Extended Nearest Neighbor Method for Pattern Recognition
Semantic History Embedding in Online Generative Topic Models Pu Wang (presenter) Authors: Loulwah AlSumait Daniel Barbará
SoundSense by Andrius Andrijauskas. Introduction  Today’s mobile phones come with various embedded sensors such as GPS, WiFi, compass, etc.  Arguably,
1 Data Mining Books: 1.Data Mining, 1996 Pieter Adriaans and Dolf Zantinge Addison-Wesley 2.Discovering Data Mining, 1997 From Concept to Implementation.
Mean-shift and its application for object tracking
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
Mobile Data Mining for Intelligent Healthcare Support By: Pari Delir Haghighi, Arkady Zaslavsky, Shonali Krishnaswamy, Mohamed Medhat.
Automatically Extracting Data Records from Web Pages Presenter: Dheerendranath Mundluru
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Data Stream Mining and Incremental Discretization John Russo CS561 Final Project April 26, 2007.
updated CmpE 583 Fall 2008 Ontology Integration- 1 CmpE 583- Web Semantics: Theory and Practice ONTOLOGY INTEGRATION Atilla ELÇİ Computer.
A learning approach for reducing data packets in sensor networks Yinghui Na.
On Node Classification in Dynamic Content-based Networks.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Adaptive Mining Techniques for Data Streams using Algorithm Output Granularity Mohamed Medhat Gaber, Shonali Krishnaswamy, Arkady Zaslavsky In Proceedings.
Classification and Novel Class Detection in Data Streams Classification and Novel Class Detection in Data Streams Mehedy Masud 1, Latifur Khan 1, Jing.
Learning from Positive and Unlabeled Examples Investigator: Bing Liu, Computer Science Prime Grant Support: National Science Foundation Problem Statement.
Consensus Group Stable Feature Selection
D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.
Streaming Pattern Discovery in Multiple Time-Series Jimeng Sun Spiros Papadimitrou Christos Faloutsos PARALLEL DATA LABORATORY Carnegie Mellon University.
CSCE 5073 Section 001: Data Mining Spring Overview Class hour 12:30 – 1:45pm, Tuesday & Thur, JBHT 239 Office hour 2:00 – 4:00pm, Tuesday & Thur,
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
An Energy-Efficient Approach for Real-Time Tracking of Moving Objects in Multi-Level Sensor Networks Vincent S. Tseng, Eric H. C. Lu, & Kawuu W. Lin Institute.
1 Systematic Data Selection to Mine Concept-Drifting Data Streams Wei Fan Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery.
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Mining Concept-Drifting Data Streams Using Ensemble Classifiers Haixun Wang Wei Fan Philip S. YU Jiawei Han Proc. 9 th ACM SIGKDD Internal Conf. Knowledge.
1 On Demand Classification of Data Streams Charu C. Aggarwal Jiawei Han Philip S. Yu Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'04),
Presented by Niwan Wattanakitrungroj
Dr. Hongqin FAN Department of Building and Real Estate
Model Discovery through Metalearning
Efficient Image Classification on Vertically Decomposed Data
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Data Mining: Concepts and Techniques (3rd ed
Query in Streaming Environment
An Enhanced Support Vector Machine Model for Intrusion Detection
Lin Lu, Margaret Dunham, and Yu Meng
University of Houston, USA
Data Mining: Concepts and Techniques Course Outline
Finding Clusters within a Class to Improve Classification Accuracy
Efficient Image Classification on Vertically Decomposed Data
Action Association Rules Mining
Yongli Zhang and Christoph F. Eick University of Houston, USA
Prepared by: Mahmoud Rafeek Al-Farra
Remah Alshinina and Khaled Elleithy DISCRIMINATOR NETWORK
CSCE 4143 Section 001: Data Mining Spring 2019.
Modeling IDS using hybrid intelligent systems
Promising “Newer” Technologies to Cope with the
Presentation transcript:

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Classification of Changes in Evolving Data Streams using Online Clustering Result Deviation Mohamed Medhat Gaber 1, and Philip S. Yu 2 1 Caulfield School of Information Technology, Monash University, 900 Dandenong Rd, Caulfield East, VIC3145, Australia 2 IBM Thomas J. Watson Research Center, Hawthorne, NY

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Introduction Traditional data mining algorithms have mainly focused on clustering, classification and frequent pattern analysis techniques. Detecting changes in data streams is an essential data mining process due to the evolving nature of streaming data. Data streams follow a stable data distribution within a domain in the normal situation in the context of a specific application. A change in the distribution and/or domain represents an event or a phenomenon that has already occurred or will occur.

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Related Work Aggarwal [Agg02], [Agg03] has proposed the use of differential kernel density estimation over time windows to detect the rate of change in data densities.  Changing the window size provides the user by the ability to detect both long-term and short-term changes. Ben-David et al [BGK04] have studied distribution change detection in data streams using statistical tests that are sensitive to distribution changes.  The techniques used are non-parametric and can distinguish between statistically significant change and noise.

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Our Proposed Framework Training Phase  Detecting changes and associating them with events using STREAM- DETECT. Classification Phase  Detecting changes and classifying them using voting-based classification using the logged history of changes using CHANGE- CLASS. Figure 1. Our Proposed Framework

Third International Workshop on Knowledge Discovery from Data Streams, 2006 STREAM-DETECT Algorithm Figure 2. STREAM-DETECT Algorithm

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Clustering Deviation Measurements

Third International Workshop on Knowledge Discovery from Data Streams, 2006 CHANGE-CLASS Algorithm This is done through the voting of the nearest neighbor of each change attribute. The classification result will be the event that won the highest vote i.e., the event that has attracted the majority of change attributes. Figure 3. CHANGE-CLASS Algorithm

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Tuning System Parameters Time frame duration (TF) and threshold value (Th) should be set according to the application requirements. Figure 4. Parameter-Tune Algorithm

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Experimental Settings The proposed framework has been implemented using Matlab We run the experiments on a workstation with Pentium 4 with 3.00 GHz CPU and 504 MB of RAM. The data used in the experiments have been generated from both uniform and normal distributions with changing domain and/or distribution parameters to represent a change in the streaming data. We have generated three different categories of datasets. Figure 5. Events and Dataset Categories

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Experimental Results (1) Figure 6. Time Frame Duration Effect on the Detection Performance

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Experimental Results (2) Figure 7. Threshold Value Effect on the Detection Performance

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Efficiency Tips Detecting small changes requires low threshold value. Detecting dramatic changes requires high threshold value. Detecting frequently occurring changes requires short time frame. Detecting less frequent events requires long time frames. Detecting changes in unstable datasets requires high threshold value. Detecting changes in stable datasets requires low threshold value.

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Experimental Results (3) Table 1. Classification Robustness

Third International Workshop on Knowledge Discovery from Data Streams, 2006 Conclusion We have proposed STREAM-DETECT algorithm for identifying change in data stream distribution and/or domain values using online clustering. STREAM-DETECT is followed by a voting-based classification technique termed as CHANGE-CLASS that associates the current change with a previously encountered event Experimental results are promising for real-life applications to use this lightweight technique especially in wireless sensor networks.

Third International Workshop on Knowledge Discovery from Data Streams, 2006 References [Agg02] C. Aggarwal, An Intuitive Framework for Understanding Changes in Evolving Data Streams, Proceedings of the ICDE Conference, [Agg03] C. Aggarwal, A Framework for Diagnosing Changes in Evolving Data Streams, Proceedings of the ACM SIGMOD Conference, [BGK04] Ben-David, S, J. Gehrke and D. Kifer, Detecting Change in Data Streams, Proceedings of VLDB [BKP03] R. Bhargava, H. Kargupta, and M. Powers, Energy Consumption in Data Analysis for On-board and Distributed Applications, Proceedings of the ICML'03 workshop on Machine Learning Technologies for Autonomous Space Applications, [Fan04a] W. Fan, StreamMiner: A Classifier Ensemble-based Engine to Mine Concept Drifting Data Streams, VLDB'2004 [Fan04b] W. Fan, Systematic data selection to mine concept-drifting data streams. KDD 2004: [FHY04] W. Fan, Yi-an Huang, Philip S. Yu, Decision Tree Evolution Using Limited Number of Labeled Data Items from Drifting Data Streams. ICDM 2004: [HSD01] G. Hulten, L. Spencer, and P. Domingos, Mining Time-Changing Data Streams, ACM SIGKDD [GKZ05] Gaber, M, M., Krishnaswamy, S., and Zaslavsky, A., On-board Mining of Data Streams in Sensor Networks, a book chapter in Advanced Methods of Knowledge Discovery from Complex Data, (Eds.) S. Badhyopadhyay, U. Maulik, L. Holder and D. Cook, Springer Verlag, [GZK05] Gaber, M, M., Zaslavsky, A., and Krishnaswamy, S., Mining Data Streams: A Review, ACM SIGMOD Record, Vol. 34, No. 1, June 2005, ISSN: [Kli04] R. Klinkenberg,, Learning Drifting Concepts: Example Selection vs. Example Weighting. In Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, Vol. 8, No. 3, pages , [Mut03] S. Muthukrishnan (2003), Data Streams: Algorithms and Applications, Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms. [NCR03] O. Nasraoui, C. Cardona, C. Rojas, F. González, TECNO-STREAMS: Tracking Evolving Clusters in Noisy Data Streams with a Scalable Immune System Learning Model, in Proc. of Third IEEE International Conference on Data Mining (ICDM'03), Melbourne, FL, November 2003, pp [TuM03] M. Tubaishat and S. Madria. Sensor Networks: an Overview. IEEE Potentials, Vol.22, Issue 2, Apr-May pages:20-23.Sensor Networks: an Overview. [WFY03] H. Wang, W. Fan, P. Yu and J. Han, Mining Concept-Drifting Data Streams using Ensemble Classifiers, in the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Aug. 2003, Washington DC, USA. [WPY05] H. Wang, J. Pei, and P. S. Yu, Online Mining of Data Streams: Problems, Applications and Progress (tutorial), in the 21st International Conference on Data Engineering (ICDE), April, 2005, Tokyo, Japan. [WZF03] K. Wang, S. Zhou, A. Fu, J. Yu. Mining changes of Classification by Correspondence Tracing. SIAM International Conference on Data Mining 2003, May 2003, San Francisco.