Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.

Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006

2 Outline  Data Stream Mining  EMM Framework  EMM Applications  Future Work  Conclusions

3 Data Mining Is the process of automatically searching large volumes of data for the nontrivial, hidden, previously unknown, and potentially useful information (interrelation of data)  Also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining,  Classification (Yahoo news, finance, etc.)  Clustering (type of customers in online purchase)  Association (Market Basket Analysis)

4 Classification:  Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes.  Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.  Decision tree, neural network, naïve Bayes, etc.  Classification is a supervised learning process.

5 Illustrating Classification Task

6 Clustering  Finding groups of objects such that the objects in a group will be similar (or related) to one another and different from (or unrelated to) the objects in other groups  Clustering is an unsupervised learning Inter-cluster distances are maximized Intra-cluster distances are minimized

7 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions Example of Association Rules {Diaper}  {Beer}, {Milk, Bread}  {Eggs,Coke}, {Beer, Bread}  {Milk}, Implication means co-occurrence, not causality!

8 Why Data Stream Mining?  A growing number of applications generate streams of data.  Computer network monitoring data (IEPM-BW2004, Abilene 2005)  Call detail records in telecommunications (Cisco VoIP data 2003)  Highway transportation traffic data (MnDot 2005)  Online web purchase log records (JcPenny data 2003)  Sensor network data (Ouse, Serwent 2002)  Stock exchange, transactions in retail chains, ATM operations in banks, credit card transactions.

9 What we see from the data streams? Characteristics of data stream:  Records may arrive at a rapid rate  High volume (possibly infinite) of continuous data  Concept drifts: Data distribution changes on the fly  Data are raw  Multidimensional  Spatiality, Temporality

10 What we see from the data streams? Requirements:  High efficient computation and processing of the input streams in terms of both time and space. Soft-real time and scalability.  “Seek needles in a haystack”. Rare event detections. Haixun Wang, Jian Pei, Philip S. Yu, ICDE 2005; Keogh, ICDM’04

11 What we see from the data streams? Stream processing restrictions:  Single pass: Each record is examined at most once  Bounded storage: Limited Memory to be used  Real-time: Per record processing time must be low  Incremental responses to queries  Our Solution Data modeling (global synopsis) Mining on local patterns based on the synopsis Incremental, scalable algorithms

12 Extensive Markov Model To develop a new data mining framework to model spatiotemporal data stream, and mine interesting local patterns. Assumptions of data:  Data are collected in discrete time intervals,  Data are in structured format,  Data are multidimensional,  Data hold an approximation of the Markov property.

13 Extensive Markov Model Capabilities of the technique:  soft real-time processing time (Incremental)  Global modeling capability (scalable, synopsis)  Local pattern finding capability (mining performed on synopsis)  Adaptive to concept changes,  Rare event detection

14 Outline  Introduction  EMM Framework  EMM Applications  Future Work  Conclusions

15 EMM: An Overview Motivation of EMM:  Markov process is a random process satisfying Markov property. Markov chain is a Markov process with discrete states.  Clustering -> determine representative granules in the data space.  Static Markov chain -> dynamic Markov chain  Map a cluster into a state in Markov chain What is EMM: A data mining framework which models spatiotemporal data stream and is employed for local pattern detections. EMM models data stream by interleaving a clustering algorithm with a dynamic Markov chain. EMM applies a series of efficient algorithms to mine interesting patterns from the modeled data (synopsis).

16 EMM Clustering Algorithms  Nearest neighbor O(m)  Hierarchical Clustering O(log m) EMM Building Algorithms O(1)  EMMIncrement algorithm,  EMMDecrement algorithm,  EMMMerge algorithm  EMMSplit algorithm EMM Application Algorithms O(1)  Predictions  Anomaly detection  Risk Assessment  Emerging Event Finding EMM Overview EMM uses any clustering algorithm EMM performs learning incrementally and is able to perform application computations simultaneously. Selection of these algorithms are solely dependent on hypotheses of data profiles

17 EMM Components and Workflow - Flexibility - Modularization - It models while executes applications

18 EMM Building EMM – A Walk Through Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes EMM Clustering

19 EMM Building EMM – A Walk Through Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes EMM Clustering CN1=1 CL11=1

20 EMM Building EMM – A Walk Through Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes EMM Clustering CN1=1 CL11=1

21 EMM Building EMM Applications EMM – A Walk Through Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes Inputs ->States 0.131.181.393.833.1910.3718.73 6-> N1 0.1140.5120.9153.1072.4098.3214.62 5-> N3 0.1160.561.013.2962.5618.96614.62 4-> N3 0.1250.5941.1693.4322.6859.50316 3-> N2 0.1350.6611.4973.7412.98910.8117.6 2-> N1 0.1370.7181.2393.8033.17910.9718.63 1-> N1 7654321Attributes EMM Clustering

24 More Issues of EMM  Label of Nodes: Cluster feature: LS: Medoid or Centroid  Label of Links:  Calibration of Granularity of Clusters Determine threshold using Markov property Parameter free modeling [Keogh, KDD04] Figure 65.5. RMS Error for Prediction in Serwent Dataset

25 Modeling Performance  Growth rate of EMM states (Matlab as a testbed) Sublinear growth of number of states Growth rate decreases Memory usage: 0.02-0.04% of data size for Ouse, Serwent, and MnDot.  Time efficiency Clustering: O(m) vs. O(log m) Markov chain: O(1)  Continued learning

26 Outline  Introduction  EMM Framework  EMM Applications Anomaly detection Risk Assessment Emerging Event Finding  Future Work  Conclusions

27 EMM Application: Anomaly Detections  Problem: compares a synopsis representing “normal” behavior to actual behavior. Any deviation is flagged as a potential interesting pattern. Also known as Positive Security Model [ http://www.imperva.com ] Assume that everything that deviates from normal is bad.  Methodology: Concepts and rules Cardinality of nodes and links Normalized Occurrence Frequency and Normalized Transition Probability  Performance Metric: detection rate = TP/(TP+TN)  Plus: has the potential to detect interesting patterns of all kind – including "unknown" patterns  Minus: can lead to a high false alarm rate.

28 EMM Application: Anomaly Detections

29 EMM Application: Anomaly Detections

30  Problem: Mitigate false alarm rate while maintain a high detection rate.  Methodology:  Historic feedbacks can be used as a free resource to take out some possibly safe anomalies  Combine anomaly detection model and user’s feedbacks.  Risk level index  Evaluation metrics: Detection rate, false alarm rate.  Results and discussions EMM Application: Risk Assessment Detection rate = TP/(TP+TN) False alarm rate = FP/(TP+FP) “98% of the alarm incidents in most communities are false alarms which distracts law enforcement from real public safety responses. “ PurvisGary, http://www.falsealarmreduction.com/ http://www.falsealarmreduction.com/

31 EMM Application: Risk Assessment

34  Problem: Model dynamic changing spatiotemporal data series. Find emerging events that represent new and significant trends.  How to delete obsolete nodes?  How to identify the new trend at an early time?  Methodology:  Sliding window : EMMDelete  Decay of importance: Aging Score  Extended Cluster Feature  Extended Transition Labeling  Emerging events  Results and discussions  O(1) EMM Application: Emerging Events

35 EMM Application: Emerging Events 0.30.4 0.6 0.7 1.0 = 0.3+0.4+0.5+0.6+1.0 = 3.0 t 1 t 1 CN = 5 S(t)=3

36 EMM Application: Emerging Events

37 Outline  Introduction  EMM Framework  EMM Applications  Future Work  Conclusions

38 Future Work: Adaptive EMM Adaptive EMM Motivation: Modeling dynamically changing data profile needs change of cluster granularity. Our proposed methodology: local ensemble of EMMs  One main EMM and two ancillary EMMs (less descriptors ),  Compare performance of the three EMMs,  Switch the main EMM  Create a new ancillary EMM based on the new main EMM (Faster time-to-mature). New algorithms are needed EMMSplit EMMMerge

39 Future Work: Hierarchical EMM Hierarchical EMM : The logical geographic area under consideration will be divided into virtual regions. A high level EMM is an agglomeration of lower level EMMs.  Parallel EMM: a high level EMM is a summary of lower level EMMs with the same features/attributes.  Heterogeneous EMM: a lower level EMM is a feature of the higher level EMM,  Recursive EMM: a lower level EMM represents one or several sub-states of the higher level EMM.

40 Conclusions  EMM is an efficient, modularized, flexible data mining framework suitable for spatiotemporal data steam processing  It has a series of applications,  EMM complies with current research trend and demanding techniques,  EMM is innovative,  List of Publications.

41 Related Publications  Yu Meng and Margaret H. Dunham, "Mining Developing Trends of Dynamic Spatiotemporal Data Streams", Journal of Computers, Vol. 1, No. 3, Academy Publisher, 2006.  Charlie Isaksson, Yu Meng and Margaret H. Dunham, "Risk Leveling of Network Traffic Anomalies", Int'l Journal of Computer Science and Network Security (IJCSNS), Vol. 6, No. 6, 2006.  Yu Meng and Margaret H. Dunham, “Online Mining of Risk Level of Traffic Anomalies with User's Feedbacks”, in Proceedings of the Second IEEE International Conference on Granular Computing (GrC'06), Atlanta, GA, May 10-12, 2006.  Y. Meng, M.H. Dunham, F.M. Marchetti, and J. Huang, ”Rare Event Detection in A Spatiotemporal Environment”, in Proceedings of the Second IEEE International Conference on Granular Computing (GrC'06), Atlanta, GA, May 10-12, 2006.  Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in A Dynamic Spatiotemporal Environment”, in Proceedings of the Tenth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD 2006), Singapore, April 9-12, 2006, Springer LNCS Vol. 3918.  M.H. Dunham, Y. Meng, and J. Huang, “Extensible Markov Model”, in Proceedings of the 4th IEEE International Conference on Data Mining (ICDM'04), Brighton, UK, November 1-4, 2004.

42 Thank you Questions?

Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.

Similar presentations

Presentation on theme: "Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006.

Similar presentations

Presentation on theme: "Data Stream Mining with Extensible Markov Model Yu Meng, Margaret H. Dunham, F. Marco Marchetti, Jie Huang, Charlie Isaksson October 18, 2006."— Presentation transcript:

Similar presentations

About project

Feedback